摘要

This paper presents an integrated analytical and profilebased crossarchitecture performance modeling tool to specifically provide interarchitecture performance prediction for Sparse MatrixVector Multiplication (SpMV) on NVIDIA GPU architectures. To design and construct the tool, we investigate the interarchitecture relative performance for multiple SpMV kernels. For a sparse matrix, based on its SpMV kernel performance measured on a reference architecture, our crossarchitecture performance modeling tool can accurately predict its SpMV kernel performance on a target architecture. The prediction results can effectively assist researchers in making choice of an appropriate architecture that best fits their needs from a wide range of available computing architectures. We evaluate our tool with 14 widelyused sparse matrices on four GPU architectures: NVIDIA Tesla C2050, Tesla M2090, Tesla K20m, and GeForce GTX 295. In our experiments, Tesla C2050 works as the reference architecture, the other three are used as the target architectures. For Tesla M2090, the average performance differences between the predicted and measured SpMV kernel execution times for CSR, ELL, COO, and HYB SpMV kernels are 3.1%, 5.1%, 1.6%, and 5.6%, respectively. For Tesla K20m, they are 6.9%, 5.9%, 4.0%, and 6.6% on the average, respectively. For GeForce GTX 295, they are 5.9%, 5.8%, 3.8%, and 5.9% on the average, respectively.

  • 出版日期2015-9-10