A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations

作者:Aktulga Hasan Metin*; Afibuzzaman Md; Williams Samuel; Buluc Aydin; Shao Meiyue; Yang Chao; Ng Esmond G; Maris Pieter; Vary James P
来源:IEEE Transactions on Parallel and Distributed Systems, 2017, 28(6): 1550-1563.
DOI:10.1109/TPDS.2016.2630699

摘要

As on-node parallelism increases and the performance gap between the processor and the memory system widens, achieving high performance in large-scale scientific applications requires an architecture-aware design of algorithms and solvers. We focus on the eigenvalue problem arising in nuclear Configuration Interaction (CI) calculations, where a few extreme eigenpairs of a sparse symmetric matrix are needed. We consider a block iterative eigensolver whose main computational kernels are the multiplication of a sparse matrix with multiple vectors (SpMM), and tall-skinny matrix operations. We present techniques to significantly improve the SpMM and the transpose operation SpMMT by using the compressed sparse blocks (CSB) format. We achieve 3-4 x speedup on the requisite operations over good implementations with the commonly used compressed sparse row (CSR) format. We develop a performance model that allows us to correctly estimate the performance of our SpMM kernel implementations, and we identify cache bandwidth as a potential performance bottleneck beyond DRAM. We also analyze and optimize the performance of LOBPCG kernels (inner product and linear combinations on multiple vectors) and show up to 15 x speedup over using high performance BLAS libraries for these operations. The resulting high performance LOBPCG solver achieves 1.4 x to 1.8 x speedup over the existing Lanczos solver on a series of CI computations on high-end multicore architectures (Intel Xeons). We also analyze the performance of our techniques on an Intel Xeon Phi Knights Corner (KNC) processor.

  • 出版日期2017-6