摘要

The sparse matrix-vector multiplication (SpMV) is of great importance in scientific computations. Graphics processing unit (GPU)-accelerated SpMVs for large-sized problems have attracted considerable attention recently. We observe that on a specific multi-GPU platform, the SpMV performance can usually be greatly improved when a matrix is partitioned into several blocks according to a predetermined rule and each block is assigned to a GPU with an appropriate storage format. This motivates us to propose a novel multi-GPU parallel SpMV optimization framework, which involves the following parts: (1) a simple rule is defined to divide any given matrix among multiple GPUs; (2) a performance model, which is independent of the problems and dependent on the resources of devices, is proposed to accurately predict the execution time of SpMV kernels; and (3) a selection algorithm is suggested to automatically select the most appropriate one from the storage formats that are involved in the framework for the matrix block that is assigned to each GPU on the basis of the performance model. The objective of our framework does not construct a new storage format or algorithm but automatically and rapidly generates an optimally parallel SpMV for any sparse matrix on a specific multi-GPU platform by integrating the existing storage formats and their corresponding kernels. We take 5 popular storage formats, for example, to present the idea of constructing the framework. Theoretically, we validate the correctness of our proposed SpMV performance model. This model is constructed only once for each type of GPU. Moreover, this framework is general and easy to be extensible. For a storage format that is not included in our framework, once the performance model of its corresponding SpMV kernel is successfully constructed, it can be incorporated into our framework. The experiments validate the efficiency of our proposed framework.