摘要

K-means clustering algorithm is one of the most famous partitioning clustering techniques that have been widely applied in many fields. Although it is very simple and fast in the process of clustering, the method suffers from a few drawbacks. K-means clustering algorithm requires to specifying the number of clusters which is difficult to know in advance for many real data sets. In addition, K-means clustering algorithm often leads to different clustering results because initial seeds are chosen randomly. To solve these problems, this paper proposes an adaptive clustering algorithm. The new algorithm adopts the idea of continuous partition of a given data set. In the process of each partition, the algorithm can select initial seeds based on max-min distance to obtain a certain result of clustering, and it can evaluate the risk of the clustering result by extending Bayesian decision theory to the field of clustering. Comparing the risk values before and after partitioning, the algorithm can decide whether the data set is continue partitioned, thus it can determine the number of clusters and get the final result of clustering automatically. The performance of the proposed algorithm has been studied on some synthetic and real world data sets. The experimental results illustrate that the new algorithm, without parameter specified by users in advance, is able to obtain efficient clustering results.

全文