摘要

SpMV(sparse matrix-vector multiplication) is an important computational kernel in scientific applications. Its performance is mainly affected by the nonzero distribution of sparse matrices. In this paper, we propose a new storage format for diagonal sparse matrices, called CRSD(compressed row segmented with diagonal-pattern). We design a diagonal pattern to represent the diagonal distribution. Unlike general-purpose methods, our CRSD based SpMV implementation is application specific since it needs to sample the sparse matrix of one given application. The optimal implementation is selected automatically for a given hardware platform during the software installation phase. The information collected during auto-tuning process is also utilized for parallel implementation to maintain load-balance. The implementation is also tuned on GPU platform. Experimental results demonstrate that the speedup reaches up to 2.37X(1.7X on average) in comparison with DIA and 4.6X(2.1X on average) in comparison with CSR under the same number of threads on Intel and AMD platforms.

全文