Adaptive Cache and Concurrency Allocation on GPGPUs

Zheng, Zhong<sup>*</sup>; Wang, Zhiying; Lipasti, Mikko

doi:10.1109/LCA.2014.2359882

摘要

Memory bandwidth is critical to GPGPU performance. Exploiting locality in caches can better utilize memory bandwidth. However, memory requests issued by excessive threads cause cache thrashing and saturate memory bandwidth, degrading performance. In this paper, we propose adaptive cache and concurrency allocation (CCA) to prevent cache thrashing and improve the utilization of bandwidth and computational resources, hence improving performance. According to locality and reuse distance of access patterns in GPGPU program, warps on a stream multiprocessor are dynamically divided into three groups: cached, bypassed, and waiting. The data cache accommodates the footprint of cached warps. Bypassed warps cannot allocate cache lines in the data cache to prevent cache thrashing, but are able to take advantage of available memory bandwidth and computational resource. Waiting warps are de-scheduled. Experimental results show that adaptive CCA can significant improve benchmark performance, with 80 percent harmonic mean IPC improvement over the baseline.

出版日期2015-12
单位中国人民解放军国防科学技术大学

全文

访问全文

收藏分享被引(9) 浏览

更新时间：2021-11-22 11:37

Adaptive Cache and Concurrency Allocation on GPGPUs

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友