摘要

The presence of triangles in massive graphs provides many important uses in different graph algorithms, such as finding highly relevant vertices for dense subgraph mining, measuring the clustering coefficient, and computing the transitivity for network analysis. In memory algorithms cannot be used for triangle listing in massive graphs because the graphs are too large to fit into memory. External memory-based techniques address this problem by focusing on the I/O efficiency to improve performance. However, triangulation is a CPU intensive process that iteratively joins lists of neighbors to determine the adjacent vertices in each triangle. Therefore, the cost of a triangle listing algorithm on a massive graph is dominated by the join operations among the lists of neighbors. In this paper, we propose a disk-based triangle listing approach that uses an efficient technique to join the lists of neighbors by exploiting CPU parallelism through bitwise operations. We represent the lists of neighbors using bit vectors and compress them using our proposed summarized bit batch, which allows the bitwise operations to be performed directly on the compressed data. Our proposed technique slices a bit vector into a series of word length bit batches that it summarizes by pruning the bit batches that contain only 0-bits. Then our proposed approach for listing the triangles asynchronously accesses the summarized bit batches and joins them using bitwise operations. Our proposed technique achieves 40% higher compression for some real world datasets compared to the classic compression technique. The triangulation technique using the summarized bit batches also significantly outperforms the existing solutions in terms of wall clock time.

  • 出版日期2018-5