Architectural Support for Address Translation on GPUs Designing Memory Management Units for CPU/GPUs with Unified Address Spaces

Pichai Bharath<sup>*</sup>; Hsu Lisa; Bhattacharjee Abhishek

doi:10.1145/2541940.2541942

摘要

The proliferation of heterogeneous compute platforms, of which CPU/GPU is a prevalent example, necessitates a manageable programming model to ensure widespread adoption. A key component of this is a shared unified address space between the heterogeneous units to obtain the programmability benefits of virtual memory. To this end, we explore GPU Memory Management Units (MMUs) consisting of Translation Lookaside Buffers (TLBs) and page table walkers (PTWs) in unified heterogeneous systems. We show the challenges posed by GPU warp schedulers on TLBs accessed in parallel with L1 caches, which provide many well-known programmability benefits. In response, we propose modest TLB and PTW augmentations that recover most of the performance lost by introducing L1-parallel TLB access. We also show that a little TLB-awareness can make other GPU performance enhancements (e.g., cache-conscious warp scheduling and dynamic warp formation on branch divergence) feasible in the face of cache-parallel address translation, bringing overheads in the range deemed acceptable for CPUs (10-15% of runtime). We presume this initial design leaves room for improvement but anticipate the bigger insight, that a little TLB-awareness goes a long way in GPUs, will spur further work in this area.

出版日期2014-4

全文

访问全文

收藏分享被引(3) 浏览

更新时间：2019-03-27 13:41

Architectural Support for Address Translation on GPUs Designing Memory Management Units for CPU/GPUs with Unified Address Spaces

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友