摘要

l(1)-minimization (-min) algorithms for the -min problem have been widely developed. For most -min algorithms, their main components include dense matrix-vector multiplications such as Ax and , and vector operations. We propose a novel warp-based implementation of the matrix-vector multiplication (Ax) on the graphics processing unit (GPU), called the GEMV kernel, and a novel thread-based implementation of the matrix-vector multiplication () on the GPU, called the GEMV-T kernel. For the GEMV kernel, a self-adaptive warp allocation strategy is used to assign the optimal warp number for each matrix row. Similar to the GEMV kernel, we design a self-adaptive thread allocation strategy to assign the optimal thread number to each matrix row for the GEMV-T kernel. Two popular -min algorithms, fast iterative shrinkage-thresholding algorithm and augmented Lagrangian multiplier method, are taken for example. Based on the GEMV and GEMV-T kernels, we present two highly parallel -min solvers on the GPU utilizing the technique of merging kernels and the sparsity of the solution of the -min algorithms. Furthermore, we design a concurrent multiple -min solver on the GPU, and optimize its performance by using new features of GPU such as the shuffle instruction and read-only data cache. The experimental results have validated high efficiency and good performance of our methods.