Asymptotically Optimal Merging on ManyCore GPUs

Kutzner Arne; Kim Pok Son<sup>*</sup>; Park Won Kwang

doi:10.1587/transinf.E95.D.2769

摘要

We propose a family of algorithms for efficiently merging on contemporary GPUs, so that each algorithm requires O(m log(n/m + 1)) element comparisons, where m and n are the sizes of the input sequences with m %26lt;= n. According to the lower bounds for merging all proposed algorithms are asymptotically optimal regarding the number of necessary comparisons. First we introduce a parallely structured algorithm that splits a merging problem of size 2(l) into 2(i) subproblems of size 2(l-i), for some arbitrary i with (0 %26lt;= i %26lt;= l). This algorithm represents a merger for i = 1 but it is rather inefficient in this case. The efficiency is boosted by moving to a two stage approach where the splitting process stops at some predetermined level and transfers control to several parallely operating block-mergers. We formally prove the asymptotic optimality of the splitting process and show that for symmetrically sized inputs our approach delivers up to 4 times faster runtimes than the thrust: :merge function that is part of the Thrust library. For assessing the value of our merging technique in the context of sorting we construct and evaluate a MergeSort on top of it. In the context of our benchmarking the resulting MergeSort clearly outperforms the MergeSort implementation provided by the Thrust library as well as Cederman%26apos;s GPU optimized variant of QuickSort.

出版日期2012-12

全文

访问全文

收藏分享被引浏览

更新时间：2018-04-10 20:25

Asymptotically Optimal Merging on ManyCore GPUs

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友