Unifying Fixed Code Mapping, Communication, Synchronization and Scheduling Algorithms for Efficient and Scalable Loop Pipelining

Mastoras Aristeidis<sup>*</sup>; Gross Thomas R

doi:10.1109/TPDS.2018.2817207

摘要

Pipelining allows the execution of loop iterations with cross-iteration dependences to overlap in time, provided that the loop body is partitioned into stages such that the data dependences are not violated. Then, the stages are mapped onto threads and communication and synchronization between stages is typically achieved using queues. Pipelining techniques that rely on static scheduling perform poorly for load-imbalanced loops. Moreover, previous research efforts that achieve load-balancing are restricted to work-stealing and imply high overhead for fine-grained loops. In this article, we present URTS, a unified runtime system with compiler support that provides a lightweight dynamic scheduler by combining mapping, communication and synchronization algorithms with a suitable data structure and an efficient ticket mechanism. Particularly, URTS shows that it is possible to combine the efficiency of static scheduling with the load-imbalance tolerance of work-stealing by using a unified design that exploits the properties of a novel data structure. The evaluation on 8- and 32-core machines shows that URTS implies low overhead, of the same order as a static scheduler, for a set of benchmarks chosen from widely-used collections. URTS is a scalable solution that performs efficient dynamic scheduling for fine-grained loops, i.e., a class of interesting loops that is poorly handled by the state-of-the-art due to high overhead.

出版日期2018-9-1

全文

访问全文

收藏分享被引(3) 浏览

更新时间：2022-08-25 07:42

Unifying Fixed Code Mapping, Communication, Synchronization and Scheduling Algorithms for Efficient and Scalable Loop Pipelining

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友