Asymptotically Optimal Merging on ManyCore GPUs

作者:Kutzner Arne; Kim Pok Son*; Park Won Kwang
来源:IEICE Transactions on Information and Systems, 2012, E95D(12): 2769-2777.
DOI:10.1587/transinf.E95.D.2769

摘要

We propose a family of algorithms for efficiently merging on contemporary GPUs, so that each algorithm requires O(m log(n/m + 1)) element comparisons, where m and n are the sizes of the input sequences with m %26lt;= n. According to the lower bounds for merging all proposed algorithms are asymptotically optimal regarding the number of necessary comparisons. First we introduce a parallely structured algorithm that splits a merging problem of size 2(l) into 2(i) subproblems of size 2(l-i), for some arbitrary i with (0 %26lt;= i %26lt;= l). This algorithm represents a merger for i = 1 but it is rather inefficient in this case. The efficiency is boosted by moving to a two stage approach where the splitting process stops at some predetermined level and transfers control to several parallely operating block-mergers. We formally prove the asymptotic optimality of the splitting process and show that for symmetrically sized inputs our approach delivers up to 4 times faster runtimes than the thrust: :merge function that is part of the Thrust library. For assessing the value of our merging technique in the context of sorting we construct and evaluate a MergeSort on top of it. In the context of our benchmarking the resulting MergeSort clearly outperforms the MergeSort implementation provided by the Thrust library as well as Cederman%26apos;s GPU optimized variant of QuickSort.

  • 出版日期2012-12

全文