A task-based approach for finding SCCs in real-world graphs on external memory

作者:Lv, Huiming*; Shao, Zhiyuan; Li, Lang; Shi, Xuanhua; Jin, Hai
来源:Concurrency and Computation: Practice and Experience (CCPE) , 2017, 29(24): e4164.
DOI:10.1002/cpe.4164

摘要

Finding strongly connected components (SCCs) in graphs is one of the important research topics of graph data mining. Traditional SCC-finding methods need to load the whole graph into main memory before actual processing, which makes them inappropriate to process today's large graphs. Although it is cost-effective to conductive SCC-finding in the external memory (EM) graph processing systems, existing EM systems are still inefficient on such workload for 2 reasons: data-parallel processing model and inefficient graph mutation mechanism. In this paper, we propose a task-based approach, named as TAS, for finding SCCs in large real-world graphs. TAS encapsulates individual graph dataset and the SCC-finding algorithms conducted on it as an independent task. By this strategy, different algorithms can be assigned to different datasets according to the stage of processing or the size of the dataset. By gradually removing unneeded data, ie, the known SCCs, with an efficient graph mutation method, TAS continuously reduces the scale of the problem to accelerate processing. Performance evaluation on large real-world graphs show that TAS greatly outperforms existing EM solutions on finding SCCs (eg, 55.2x faster than GraphChi and 40.3x faster than X-Stream).

全文