MtMR: Ensuring MapReduce Computation Integrity with Merkle Tree-Based Verifications

作者:Wang, Yongzhi*; Shen, Yulong; Wang, Hua; Cao, Jinli; Jiang, Xiaohong
来源:IEEE Transactions on Big Data, 2018, 4(3): 418-431.
DOI:10.1109/TBDATA.2016.2599928

摘要

Big data applications have made significant impacts in recent years thanks to the fast growth of cloud computing and big data infrastructures. However, public cloud is still not widely accepted to perform big data computing, due to the concern with the public cloud's security. Result integrity is one of the most significant security problems that exists in the cloud-based big data computing scenario. In this paper, we propose MtMR, a Merkle tree-based verification method that assures high result integrity of MapReduce jobs. MtMR overlays MapReduce on a hybrid cloud environment and applies two rounds of Merkle tree-based verifications on the pre-reduce phase (i.e., the map phase and the shuffle phase) and the reduce phase, respectively. In each round, MtMR samples a small portion of reduce task input/output records on the private cloud and performs Merkle tree-based verification on all the task input/output records. Based on the design of MtMR, we perform a series of theoretical studies to analyze its security and performance overhead. Our results indicate that MtMR is a promising method in terms of high result integrity and low performance overhead. For example, by setting the sampled record ratio as an optimal value, MtMR can guarantee no more than 10 incorrect records in each reduce task by sampling only 4 percent of records in that task.