A New Efficient Resource Management Framework for Iterative MapReduce Processing in Large-Scale Data Analysis

Hong Seungtae; Park Kyongseok; Lim Chae Deok; Chang Jae Woo<sup>*</sup>

doi:10.1587/transinf.2016DAP0013

摘要

To analyze large-scale data efficiently, studies on Hadoop, one of the most popular MapReduce frameworks, have been actively done. Meanwhile, most of the large-scale data analysis applications, e.g., data clustering, are required to do the same map and reduce functions repeatedly. However, Hadoop cannot provide an optimal performance for iterative MapReduce jobs because it derives a result by doing one phase of map and reduce functions. To solve the problems, in this paper, we propose a new efficient resource management framework for iterative MapReduce processing in large-scale data analysis. For this, we first design an iterative job state-machine for managing the iterative MapReduce jobs. Secondly, we propose an invariant data caching mechanism for reducing the I/O costs of data accesses. Thirdly, we propose an iterative resource management technique for efficiently managing the resources of a Hadoop cluster. Fourthly, we devise a stop condition check mechanism for preventing unnecessary computation. Finally, we show the performance superiority of the proposed framework by comparing it with the existing frameworks.

出版日期2017-4

全文

访问全文

收藏分享被引浏览

更新时间：2021-01-17 09:51

A New Efficient Resource Management Framework for Iterative MapReduce Processing in Large-Scale Data Analysis

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友