Autotuning divide-and-conquer stencil computations

作者:Natarajan Ekanathan Palamadai*; Dehnavi Maryam Mehri; Leiserson Charles
来源:Concurrency and Computation-Practice & Experience, 2017, 29(17): e4127.
DOI:10.1002/cpe.4127

摘要

This paper explores autotuning strategies for serial divide-and-conquer stencil computations, comparing the efficacy of traditional heuristic autotuning with that of pruned-exhaustive autotuning. We present a pruned-exhaustive autotuner called Ztune that searches for optimal divide-and-conquer trees for stencil computations. Ztune uses three pruning propertiesspace-time equivalence, divide subsumption, and favored dimensionthat greatly reduce the size of the search domain without significantly sacrificing the quality of the autotuned code. We compared the performance of Ztune with that of a state-of-the-art heuristic autotuner called OpenTuner in tuning the divide-and-conquer algorithm used in Pochoir stencil compiler. Over a nightly run on ten application benchmarks across two machines with different hardware configurations, the Ztuned code ran 5% -12% faster on average, and the OpenTuner tuned code ran from 9% slower to 2% faster on average, than Pochoir's default code. In the best case, the Ztuned code ran 40% faster, and the OpenTuner tuned code ran 33% faster than Pochoir's code. Whereas the autotuning time of Ztune for each benchmark could be measured in minutes, to achieve comparable results, the autotuning time of OpenTuner was typically measured in hours or days. Surprisingly, for some benchmarks, Ztune actually autotuned faster than the time it takes to perform the stencil computation once.

  • 出版日期2017-9-10
  • 单位rutgers; MIT

全文