PADP: A parallel data possession audit model for cloud storage

作者:Yang, Lei*; Xu, Kui; Liu, Shi
来源:Concurrency and Computation: Practice and Experience (CCPE) , 2017, 29(20): e4154.
DOI:10.1002/cpe.4154

摘要

When facing massive statistical data, the k-means algorithm is very difficult to satisfy the need of data processing as it lacks an effective parallel mechanism. This paper proposes an improved k-means algorithm (IMR-KCA) to conduct clustering analysis based on medical data employing MapReduce computing framework. Through analyzing the defects of vast redundancy in the traditional k-means algorithms, a selection model is firstly proposed to simplify the computations with multiple clustering centers. Based on several proposed theorems, we prove the correctness of this selection model. Second, this paper provides a method to calculate the distances from extreme points to central points, and the original Euclidean distance is replaced with Manhattan distance. For this simplification, a group of theorems are proposed to prove the correctness. Next, we provide a group of implementation algorithms to complete the parallelism of the clustering computation employing the MapReduce framework. Finally, the experimental results illustrate that IMR-KCA is more reliable and efficient than the direct parallelization of the traditional clustering algorithms based on MapReduce.