摘要

Pipelining allows the execution of loop iterations with cross-iteration dependences to overlap in time, provided that the loop body is partitioned into stages such that the data dependences are not violated. Then, the stages are mapped onto threads and communication and synchronization between stages is typically achieved using queues. Pipelining techniques that rely on static scheduling perform poorly for load-imbalanced loops. Moreover, previous research efforts that achieve load-balancing are restricted to work-stealing and imply high overhead for fine-grained loops. In this article, we present URTS, a unified runtime system with compiler support that provides a lightweight dynamic scheduler by combining mapping, communication and synchronization algorithms with a suitable data structure and an efficient ticket mechanism. Particularly, URTS shows that it is possible to combine the efficiency of static scheduling with the load-imbalance tolerance of work-stealing by using a unified design that exploits the properties of a novel data structure. The evaluation on 8- and 32-core machines shows that URTS implies low overhead, of the same order as a static scheduler, for a set of benchmarks chosen from widely-used collections. URTS is a scalable solution that performs efficient dynamic scheduling for fine-grained loops, i.e., a class of interesting loops that is poorly handled by the state-of-the-art due to high overhead.

  • 出版日期2018-9-1