Iteration Interleaving-Based SIMD Lane Partition

作者:Wang, Yaohua*; Wang, Dong; Chen, Shuming; Liu, Zonglin; Chen, Shenggang; Chen, Xiaowen; Zhou, Xu
来源:ACM Transactions on Architecture and Code Optimization, 2016, 12(4): 58.
DOI:10.1145/2847253

摘要

The efficacy of single instruction, multiple data (SIMD) architectures is limited when handling divergent control flows. This circumstance results in SIMD fragments using only a subset of the available lanes. We propose an iteration interleaving-based SIMD lane partition (IISLP) architecture that interleaves the execution of consecutive iterations and dynamically partitions SIMD lanes into branch paths with comparable execution time. The benefits are twofold: SIMD fragments under divergent branches can execute in parallel, and the pathology of fragment starvation can also be well eliminated. Our experiments show that IISLP doubles the performance of a baseline mechanism and provides a speedup of 28% versus instruction shuffle.