Alleviating Scalability Limitation of Accelerator-Based Platforms

Teimouri, Nasibeh<sup>*</sup>; Tabkhi, Hamed; Schirner, Gunar

doi:10.1109/TCAD.2018.2846632

摘要

Accelerator-based chip multiprocessors (ACMPs), which combine application-specific HW accelerators (ACCs) with host processor core(s), are promising architectures for high-performance and power-efficient computing. However, ACMPs with many ACCs have scalability limitations. The ACCs' performance benefits can be overshadowed by bottlenecks on shared resources of processor core(s), communication fabric/DMA, and on-chip memory. Primarily, this is rooted in the ACCs' data access and the orchestration dependency. Due to very loosely defined ACC communication semantics, and relying on general architectures, the resources bottlenecks hamper performance. This paper explores and alleviates the scalability limitations of ACMPs. To this end, this paper first proposes ACMPerf, an analytical model to capture the impact of the resources bottlenecks on the achievable ACCs' benefits. Then, this paper identifies and formalizes ACC communication semantics which paves the path toward a more scalable integration of ACCs. The semantics describe four primary aspects: 1) data access; 2) data granularity; 3) data marshalling; and 4) synchronization. Finally, this paper proposes a novel architecture of transparent self-synchronizing accelerators (TSS). TSS efficiently realizes our identified communication semantics of direct ACC-to-ACC connections often occurring in streaming applications. TSS delivers more of the ACCs' benefits than conventional ACMP architectures. Given the same set of ACCs, TSS has up to 130x higher throughput and 78x lower energy consumption, mainly due to reducing the load on shared architectural resources by 78.3x.

出版日期2019-7
单位东北大学

全文

访问全文

收藏分享被引(3) 浏览

更新时间：2024-04-18 16:25

Alleviating Scalability Limitation of Accelerator-Based Platforms

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友