摘要

With the rapid development of social media applications, lots of users are connected with friends online, and their daily life and opinions are recorded. Social media provides us an unprecedented way to collect and analyze billions of users' information. Proper user attribute identification or profile inference becomes increasingly attractive and feasible. However, the flourishing social records also pose great challenge in effective feature selection and integration for user profile inference, which is mainly caused by the text diversity and complex community structures. In this paper, we propose a comprehensive framework to infer the user occupation from his/her social activities recorded in the micro-blog system, which is a multi-source integration framework that combines both content and network information. We first identify some beneficial content features, and propose a machine learning classification model, named content model. We proceed to exploit the social network information, which tailors a community discovery based latent dimension solution to extract community-based feature, and utilizes the neighbor predictions for inference updating. Extensive empirical studies are conducted on a large real-life micro-blog dataset. The experimental results demonstrate the superiority of our integrated model for the occupation inference task, verify the effect of homophily in user interaction records, and reveal different effects of heterogeneous interactive networks.