Decoupled iteration mapping: improving dependency-loop performance on SIMD processors

Yang, Hui<sup>*</sup>; Chen, Shuming; Wan, Jianghua; Dai, Huanyao

doi:10.1587/elex.10.20130798

摘要

Wide Single Instruction Multiple Data (SIMD) architectures are very important in the compute-intensive applications, but less efficient for applications with cross-iteration dependency loops which are difficult to parallelize and vectorize. This paper introduces Decoupled Iteration Mapping (DIM), a technique that dynamically maps a cross-iteration dependency loop onto the improved SIMD architecture which achieved multicore-like thread-parallel performance. The minor modification on the baseline architecture is composed of a Prefetch Unit & Instruction Buffer Array (PU&IBA), a Loop Control Unit & Instruction Dispatch Unit (LCU&IDU), and a Data Buffer Chain (DBC). Experimental results show that, the proposed DIM scheme can achieve average 3.04x performance speedup with a cost of only 6.44% area overhead.

出版日期2013
单位中国人民解放军国防科学技术大学

全文

访问全文

收藏分享被引浏览

更新时间：2019-08-15 06:53

Decoupled iteration mapping: improving dependency-loop performance on SIMD processors

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友