Tuple MapReduce and Pangool: an associated implementation

作者:Ferrera Pedro; De Prado Ivan; Palacios Eric; Fernandez Marquez Jose Luis*; Serugendo Giovanna Di Marzo
来源:Knowledge and Information Systems, 2014, 41(2): 531-557.
DOI:10.1007/s10115-013-0705-z

摘要

This paper presents Tuple MapReduce, a new foundational model extending MapReduce with the notion of tuples. Tuple MapReduce allows to bridge the gap between the low-level constructs provided by MapReduce and higher-level needs required by programmers, such as compound records, sorting, or joins. This paper shows as well Pangool, an open-source framework implementing Tuple MapReduce. Pangool eases the design and implementation of applications based on MapReduce and increases their flexibility, still maintaining Hadoop's performance. Additionally, this paper shows: pseudo-codes for relational joins, rollup, and the PageRank algorithm; a Pangool's code example; benchmark results comparing Pangool with existing approaches; reports from users of Pangool in industry; and the description of a distributed database exploiting Pangool. These results show that Tuple MapReduce can be used as a direct, better-suited replacement of the MapReduce model in current implementations without the need of modifying key system fundamentals.

  • 出版日期2014-11

全文