Scatter-Gather-Merge: An efficient star-join query processing algorithm for data-parallel frameworks

Han Hyuck; Jung Hyungsoo<sup>*</sup>; Eom Hyeonsang; Yeom Heon Y

doi:10.1007/s10586-010-0144-5

摘要

A data-parallel framework is very attractive for large-scale data processing since it enables such an application to easily process a huge amount of data on commodity machines. MapReduce, a popular data-parallel framework, is used in various fields such as web search, data mining and data warehouses; it is proven to be very practical for such a data-parallel application. A star-join query is a popular query in data warehouses that are a current target domain of data-parallel frameworks. This article proposes a new algorithm that efficiently processes star-join queries in data-parallel frameworks such as MapReduce and Dryad. Our star-join algorithm for general data-parallel frameworks is called Scatter-Gather-Merge, and it processes star-join queries in a constant number of computation steps, although the number of participating dimension tables increases. By adopting bloom filters, Scatter-Gather-Merge reduces a non-trivial amount of IO. We also show that Scatter-Gather-Merge can be easily applied to MapReduce. Our experimental results in both cluster and cloud environments show that Scatter-Gather-Merge outperforms existing approaches.

出版日期2011-6

全文

访问全文

收藏分享被引(2) 浏览

更新时间：2018-01-18 19:28

Scatter-Gather-Merge: An efficient star-join query processing algorithm for data-parallel frameworks

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友