An Optimal Microarchitecture for Stencil Computation Acceleration Based on Nonuniform Partitioning of Data Reuse Buffers

Cong, Jason<sup>*</sup>; Li, Peng; Xiao, Bingjun<sup>*</sup>; Zhang, Peng

doi:10.1109/TCAD.2015.2488491

摘要

High-level synthesis (HLS) tools have made significant progress in compiling high-level descriptions of computation into highly pipelined register-transfer level specifications. The high-throughput computation raises a high data demand. To prevent data accesses from being the bottleneck, on-chip memories are used as data reuse buffers to reduce off-chip accesses. Also memory partitioning is explored to increase the memory bandwidth by scheduling multiple simultaneous memory accesses to different memory banks. Prior work on memory partitioning of data reuse buffers is limited to uniform partitioning. In this paper, we perform an early-stage exploration of nonuniform memory partitioning. We use the stencil computation, a popular communication-intensive application domain, as a case study to show the potential benefits of nonuniform memory partitioning. Our novel method can always achieve the minimum memory size and the minimum number of memory banks, which cannot be guaranteed in any prior work. We develop a generalized microarchitecture to decouple stencil accesses from computation, and an automated design flow to integrate our microarchitecture with the HLS-generated computation kernel for a complete accelerator.

出版日期2016-3

全文

访问全文

收藏分享被引(7) 浏览

更新时间：2024-03-22 18:41

An Optimal Microarchitecture for Stencil Computation Acceleration Based on Nonuniform Partitioning of Data Reuse Buffers

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友