Automatic tuning of sparse matrix-vector multiplication on multicore clusters

作者:Li ShiGang*; Hu ChangJun; Zhang JunChao; Zhang YunQuan
来源:Science China Information Sciences, 2015, 58(9): 092102.
DOI:10.1007/s11432-014-5254-x

摘要

To have good performance and scalability, parallel applications should be sophisticatedly optimized to exploit intra-node parallelism and reduce inter-node communication on multicore clusters. This paper investigates the automatic tuning of the sparse matrix-vector (SpMV) multiplication kernel implemented in a partitioned global address space language, which supports a hybrid thread-and process-based communication layer for multicore systems. One-sided communication is used for inter-node data exchange, while intra-node communication uses a mix of process shared memory and multithreading. We develop performance models to facilitate selecting the best configuration of threads and processes hybridization as well as the best communication pattern for SpMV. As a result, our tuned SpMV in the hybrid runtime environment consumes less memory and reduces inter-node communication volume, without damaging the data locality. Experiments are conducted on 12 real sparse matrices. On 16-node Xeon and 8-node Opteron clusters, our tuned SpMV kernel gets on average 1.4X and 1.5X improvement in performance over the well-optimized process-based message-passing implementation, respectively.

  • 出版日期2015-9
  • 单位北京科技大学; 中国科学院; 计算机体系结构国家重点实验室