A model-driven blocking strategy for load balanced sparse matrix-vector multiplication on GPUs

Ashari Arash<sup>*</sup>; Sedaghati Naser; Eisenlohr John; Sadayappan P

doi:10.1016/j.jpdc.2014.11.001

摘要

Sparse Matrix-Vector multiplication (SpMV) is one of the key operations in linear algebra. Overcoming thread divergence, load imbalance and un-coalesced and indirect memory access due to sparsity and irregularity are challenges to optimizing SpMV on GPUs. In this paper we present a new Blocked Row-Column (BRC) storage format with a two-dimensional blocking mechanism that addresses these challenges effectively. It reduces thread divergence by reordering and blocking rows of the input matrix with nearly equal number of non-zero elements onto the same execution units (i.e., warps). BRC improves load balance by partitioning rows into blocks with a constant number of non-zeros such that different warps perform the same amount of work. We also present an approach to optimize BRC performance by judicious selection of block size based on sparsity characteristics of the matrix. A CUDA implementation of BRC outperforms NVIDIA CUSP and cuSPARSE libraries and other stateof-the-art SpMV formats on a range of unstructured sparse matrices from multiple application domains. The BRC format has been integrated with PETSc, enabling its use in PETSc's solvers. Furthermore, when partitioning the input matrix, BRC achieves near linear speedup on multiple GPUs.

出版日期2015-2

全文

访问全文

收藏分享被引(10) 浏览

更新时间：2024-04-08 01:08

A model-driven blocking strategy for load balanced sparse matrix-vector multiplication on GPUs

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友