Distributed Data Management Using MapReduce

作者:Li, Feng*; Ooi, Beng Chin; Oezsu, M. Tamer; Wu, Sai
来源:ACM Computing Surveys, 2014, 46(3): 31.
DOI:10.1145/2503009

摘要

MapReduce is a framework for processing and managing large-scale datasets in a distributed cluster, which has been used for applications such as generating search indexes, document clustering, access log analysis, and various other forms of data analytics. MapReduce adopts a flexible computation model with a simple interface consisting of map and reduce functions whose implementations can be customized by application developers. Since its introduction, a substantial amount of research effort has been directed toward making it more usable and efficient for supporting database-centric operations. In this article, we aim to provide a comprehensive review of a wide range of proposals and systems that focusing fundamentally on the support of distributed data management and processing using the MapReduce framework.