摘要

In this paper, the performance of the Cyclic Reduction (CR) algorithm for solving tridiagonal systems is improved with the aid of efficient global memory transactions on Graphics Processing Units (GPU). To achieve maximum memory throughput with a lower computational runtime, two different Sort algorithms are introduced for reordering the initial system of equations: direct and step-by-step. It is shown that the latter method is well-fitted to modern GPUs and achieves speedup of up to 3.47x in single precision and 2.1 x in double precision compared to the CPU Thomas algorithm. By benefiting from the new global memory implementation, the CR solver could run 2 x -100 x faster compared to previous works on parallel tridiagonal solvers. The CR solver is also applied to 2D & 3D compressible viscous flow simulations using the high-order compact finite-difference scheme. In this matter, the procedure of filtering, primitive variables, and flux derivative calculations are carried out by using the parallel tridiagonal solver on the GPU device. The GPU-accelerated calculations achieve speedups between 1.9 x -15.2 x in 2D and 6.4x-20.3x in 3D simulations for different grid sizes compared to CPU computations. The computations are performed on the NVIDIA GTX480 GPU. The obtained results are compared to those achieved on a single core of Intel Core 2 Duo (2.7 GHz, 2 MB cache) in terms of calculation runtime.

  • 出版日期2014-3-20