摘要

In this paper, a new symmetry-based genetic clustering algorithm is proposed which automatically evolves the number of clusters as well as the proper partitioning from a data set. Strings comprise both real numbers and the don't care symbol in order to encode a variable number of clusters. Here, assignment of points to different clusters are done based on a point symmetry (PS)-based distance rather than the euclidean distance. A newly proposed PS-based cluster validity index, Sym-index, is used as a measure of the validity of the corresponding partitioning. The algorithm is, therefore, able to detect both convex and nonconvex clusters irrespective of their sizes and shapes as long as they possess the symmetry property. Kd-tree-based nearest neighbor search is used to reduce the complexity of computing PS-based distance. A proof on the convergence property of variable string length genetic algorithm with PS-distance-based clustering (VGAPS-clustering) technique is also provided. The effectiveness of VGAPS-clustering compared to variable string length Genetic K-means algorithm (GCUK-clustering) and one recently developed weighted sum validity function-based hybrid niching genetic algorithm (HNGA-clustering) is demonstrated for nine artificial and five real-life data sets.

  • 出版日期2008-11