摘要

User identification is very helpful for building a better profile of a user. Some works have been devoted to this issue. However, the existing works with a good performance are mainly based on the rich online data and do not consider the cost of online data acquisition. In this paper, we aim to address this issue with a lower cost of data acquisition. A machine learning-based solution is proposed solely based on the user's display names. It consists of three key steps: we first analyze the users' unique naming patterns that lead to information redundancies across sites; second, we construct features that exploit information redundancies; afterward, we employ machine learning method for user identification. The experiment shows that the proposed solution can provide excellent performance with Fl score reaching 96.24%, 92.49%, and 90.68% on three real different data sets, respectively. This paper shows the possibility of user identification with a lower cost of data acquisition.