摘要

This paper presents a parallel varied density-based clustering algorithm with optimized data partition (PVDB). First, we improve the partition with reduced boundary points algorithm using shared nearest neighbour (SNN) methods and propose the reachable partition with reduced boundary points algorithm. Second, we introduce a layered grouping grid structure and propose an efficient k nearest neighbour (kNN) search method. This method enhances the efficiency of kNN searches and determines whether kNNs are in their own partitions. Third, we propose a new merging strategy for connecting clusters in different partitions, based on the reachable point concept. Meanwhile, the strategy avoids connecting clusters with varying densities by SNN as occurs with SNN-based clustering methods. Our algorithm is implemented and compared with DBSCAN-MR and GriDBSCAN using the MapReduce paradigm and shows better varied density clustering capability and scalability. In addition, varied applications show our algorithm's capability of discerning spatial patterns and extending to many fields.