A Fast Parallel Community Discovery Model on Complex Networks Through Approximate Optimization

作者:Qiao, Shaojie; Han, Nan*; Gao, Yunjun*; Li, Rong-Hua; Huang, Jianbin; Guo, Jun; Gutierrez, Louis Alberto; Wu, Xindong
来源:IEEE Transactions on Knowledge and Data Engineering, 2018, 30(9): 1638-1651.
DOI:10.1109/TKDE.2018.2803818

摘要

Community discovery plays an essential role in the analysis of the structural features of complex networks. Since online networks grow increasingly large and complex over time, the methods traditionally used for community discovery cannot efficiently handle large-scale network data. This introduces the important problem of how to effectively and efficiently discover large communities from complex networks. In this study, we propose a fast parallel community discovery model called picaso (a parallel community discovery algorithm based on approximate optimization), which integrates two new techniques: (1) Mountain model, which works by utilizing graph theory to approximate the selection of nodes needed for merging, and (2) Landslide algorithm, which is used to update the modularity increment based on the approximated optimization. In addition, the GraphX distribution computing framework is employed in order to achieve parallel community detection over complex networks. In the proposed model, clustering on modularity is used to initialize the Mountain model as well as to compute the weight of each edge in the networks. The relationships among the communities are then simplified by applying the Landslide algorithm, which allows us to obtain the community structures of the complex networks. Extensive experiments were conducted on real and synthetic complex network datasets, and the results demonstrate that the proposed algorithm can outperform the state of the art methods, in effectiveness and efficiency, when working to solve the problem of community detection. Moreover, we demonstratively prove that overall time performance approximates to four times faster than similar approaches. Effectively our results suggest a new paradigm for large-scale community discovery of complex networks.