Supernodal sparse Cholesky factorization on graphics processing units

作者:Zou, Dan*; Dou, Yong; Guo, Song; Li, Rongchun; Deng, Lin
来源:Concurrency and Computation-Practice & Experience, 2014, 26(16): 2713-2726.
DOI:10.1002/cpe.3158

摘要

Sparse Cholesky factorization is the most computationally intensive component in solving large sparse linear systems and is the core algorithm of numerous scientific computing applications. A large number of sparse Cholesky factorization algorithms have previously emerged, exploiting architectural features for various computing platforms. The recent use of graphics processing units (GPUs) to accelerate structured parallel applications shows the potential to achieve significant acceleration relative to desktop performance. However, sparse Cholesky factorization has not been explored sufficiently because of the complexity involved in its efficient implementation and the concerns of low GPU utilization. In this paper, we present a new approach for sparse Cholesky factorization on GPUs. We present the organization of the sparse matrix supernode data structure for GPU and propose a queue-based approach for the generation and scheduling of GPU tasks with dense linear algebraic operations. We also design a subtree-based parallel method for multi-GPU system. These approaches increase GPU utilization, thus resulting in substantial computational time reduction. Comparisons are made with the existing parallel solvers by using problems arising from practical applications. The experiment results show that the proposed approaches can substantially improve sparse Cholesky factorization performance on GPUs. Relative to a highly optimized parallel algorithm on a 12-core node, we were able to obtain speedups in the range 1.59x to 2.31x by using one GPU and 1.80x to 3.21x by using two GPUs. Relative to a state-of-the-art solver based on supernodal method for CPU-GPU heterogeneous platform, we were able to obtain speedups in the range 1.52x to 2.30x by using one GPU and 2.15x to 2.76x by using two GPUs. Concurrency and Computation: Practice and Experience, 2013.