摘要

Traditional approaches to community detection, as studied by physicists, sociologists, and more recently computer scientists, aim at simply partitioning the social network graph. However, with the advent of online social networking sites, richer data has become available: beyond the link information, each user in the network is annotated with additional information, for example, demographics, shopping behavior, or interests. In this context, it is therefore important to develop mining methods which can take advantage of all available information. In the case of community detection, this means finding good communities (a set of nodes cohesive in the social graph) which are associated with good descriptions in terms of user information (node attributes). Having good descriptions associated to our models make them understandable by domain experts and thus more useful in real-world applications. Another requirement dictated by real-world applications, is to develop methods that can use, when available, any domain-specific background knowledge. In the case of community detection the background knowledge could be a vague description of the communities sought in a specific application, or some prototypical nodes (e.g., good customers in the past), that represent what the analyst is looking for (a community of similar users). Towards this goal, in this article, we define and study the problem of finding a diverse set of cohesive communities with concise descriptions. We propose an effective algorithm that alternates between two phases: a hill-climbing phase producing (possibly overlapping) communities, and a description induction phase which uses techniques from supervised pattern set mining. Our framework has the nice feature of being able to build well-described cohesive communities starting from any given description or seed set of nodes, which makes it very flexible and easily applicable in real-world applications. Our experimental evaluation confirms that the proposed method discovers cohesive communities with concise descriptions in realistic and large online social networks such as DELICIOUS, FLICKR, and LASTFM.

  • 出版日期2014-4