摘要

Sparse matrix-vector multiplication (SpMV) is an essential kernel in sparse linear algebra and has been studied extensively on all modern processor and accelerator architectures. Compressed Sparse Row (CSR) is a frequently used format for sparse matrices storage. However, CSR-based SpMV has poor performance on processors with vector units. In order to take full advantage of SIMD acceleration technology in SpMV, we proposed a new matrix storage format called CSR-SIMD. The new storage format compresses the non-zero elements into many variable-length data fragments with consecutive memory access addresses. Thus, the data locality of sparse matrix A and dense vector x expands and the floating-point operations for each fragment can be completely calculated by vectorized implementation on wide SIMD units. Our experimental results indicate that CSR-SIMD has better storage efficiency and low-overhead for format conversion. Besides, the new format achieves high scalability on wide SIMD units. In comparison with the CSR-based and BCSR-based SpMV, CSR-SIMD obtains better performance on FT1500A, Intel Xeon, and Intel Xeon Phi.