A run-time optimization approach for reducing data movements using locality-aware searching

作者:Li Liang; Wang Endong; Zhang Xingjun; Yan Kang; Ju Tao; Dong Xiaoshe*
来源:Journal of Supercomputing, 2014, 69(2): 864-886.
DOI:10.1007/s11227-014-1186-x

摘要

The CPU-GPU communication bottleneck limits the performance improvement of GPU applications in heterogeneous GPGPU systems and usually is handled by data reuse optimization. This paper analyzes data reuse through DAG abstraction and obtains rules showing that the run-time data reuse optimization can effectively relieve the bottleneck. Based on the rules, this paper proposes a run-time optimization framework for data reuse, called R-Tracker. The R-Tracker uses locality-aware searching approach to handle reuses. It can not only low costly implement the data reuse optimization but also effectively implement the searching, the data transfers, and the GPU computation concurrently. R-Tracker relaxes the constraints that are required in compiler-based approaches and thus achieves better reuse effect. The experimental results show that R-Tracker improves the performance by 1.77-16.42 % over compiler-based approach OpenMPC and 1.40-8.39 % over CGCM in single-node execution, and 48.78-60 % over CGCM in multi-node execution.

全文