An efficient graph data processing system for large-scale social network service applications

作者:Zhou, Wei; Han, Jizhong*; Gao, Yun; Xu, Zhiyong
来源:Concurrency and Computation: Practice and Experience (CCPE) , 2016, 28(3): 729-747.
DOI:10.1002/cpe.3393

摘要

Trust in social network draws more and more attentions from both the academia and industry fields. Public opinion analysis is a direct way to increase the trust in social network. Because the public opinion analysis can be expressed naturally by the graph algorithm and graph data are the default data organization mechanism used in large-scale social network service applications, more and more research works apply the graph processing system to deal with the public opinion analysis. As the data volume is growing rapidly, the distributed graph systems are introduced to process the large-scale public opinion analysis. Most of graph algorithms introduce a large number of data iterations, so the synchronization requirements between successive iterations can severely jeopardize the effectiveness of parallel operations, which makes the data aggregation and analysis operations become slower. In this paper, we propose a large-scale graph data processing system to address these issues, which includes a graph data processing model, Arbor. Arbor develops a new graph data organization format to represent the social relationship, and the format can not only save storage space but also accelerate graph data processing operations. Furthermore, Arbor substitutes time-constrained synchronization operations with non-time-constrained control message transmissions to increase the degree of parallelism. Based on the system, we put forward two most frequently used graph applications on Arbor: shortest path and PageRank. In order to evaluate the system, we compare Arbor with the other graph processing systems using large-scale experimental graph data, and the results show that it outperforms the state-of-the-art systems.