摘要

In this paper, a two-stage genetic clustering algorithm (TGCA) is proposed. This algorithm can automatically determine the proper number of clusters and the proper partition from a given data set. The two-stage selection and mutation operations are implemented to exploit the search capability of the algorithm by changing the probabilities of selection and mutation according to the consistence of the number of clusters in the population. First, the TGCA focuses on the search of the best number of clusters, and then gradually transfers towards finding the globally optimal cluster centers. Furthermore, a maximum attribute range partition approach is used in the population initialization so as to overcome the sensitivity of clustering algorithms to initial partitions. Finally, the efficiency of TGCA has been extensively compared with several automatic clustering algorithms, including hierarchical agglomerative k-means, automatic spectral algorithm and a standard genetic k-means clustering algorithm (SGKC). Experimental results on four artificial and seven real-life data sets show that the TGCA has derived better performance on the search of the cluster numbers and higher accuracy on clustering problems.