摘要

Clustering has been used in many areas. It is an unsupervised learning method which tries to find some distributions and patterns in unlabeled data sets. Although clustering algorithms have been studied for decades, none of them is all purpose. This paper presents a new clustering algorithm, Clustering based on Near Neighbor Influence (CNNI), an improved version in time cost of CNNI algorithm (ICNNI), and a variation of CNNI algorithm (VCNNI). They are inspired by the idea of near neighbors and the superposition principle of influence. In order to clearly describe the three algorithms, it lists three basic concepts (near neighbor point set, grid cell, and near neighbor grid cell set) and introduces two important concepts (near neighbor influence and a kind of similarity measure). In the simulations, four famous clustering algorithms (K-Means, FCM, AP, and DBSCAN) are used as comparative algorithms. From the simulated experiments of some artificial data sets and some real data sets, we observed that CNNI, ICNNI, and VCNNI can find those obvious clusters and get better (or similar) clustering results than (or with) K-Means, FCM, and AP for some data sets. We also observed that ICNNI is faster than CNNI with the same clustering results, CNNI and ICNNI are faster than AP with better or similar clustering quality, CNNI needs less space than VCNNI and DBSCAN, and VCNNI gets similar clustering results with DBSCAN. Especially, CNNI, ICNNI, and VCNNI can easily find some noises or isolates. At last, it gives several solid and insightful future research suggestions.