MapReduce Parallel Programming Model: A State-of-the-Art Survey

作者:Li, Ren*; Hu, Haibo; Li, Heng; Wu, Yunsong; Yang, Jianxi
来源:International Journal of Parallel Programming, 2016, 44(4): 832-866.
DOI:10.1007/s10766-015-0395-0

摘要

With the development of information technologies, we have entered the era of Big Data. Google's MapReduce programming model and its open-source implementation in Apache Hadoop have become the dominant model for data-intensive processing because of its simplicity, scalability, and fault tolerance. However, several inherent limitations, such as lack of efficient scheduling and iteration computing mechanisms, seriously affect the efficiency and flexibility of MapReduce. To date, various approaches have been proposed to extend MapReduce model and improve runtime efficiency for different scenarios. In this review, we assess MapReduce to help researchers better understand these novel optimizations that have been taken to address its limitations. We first present the basic idea underlying MapReduce paradigm and describe several widely used open-source runtime systems. And then we discuss the main shortcomings of original MapReduce. We also review these MapReduce optimization approaches that have recently been put forward, and categorize them according to the characteristics and capabilities. Finally, we conclude the paper and suggest several research works that should be carried out in the future.