摘要

This article presents a multi-objective genetic algorithm which considers the problem of data clustering. A given dataset is automatically assigned into a number of groups in appropriate fuzzy partitions through the fuzzy c-means method. This work has tried to exploit the advantage of fuzzy properties which provide capability to handle overlapping clusters. However, most fuzzy methods are based on compactness and/or separation measures which use only centroid information. The calculation from centroid information only may not be sufficient to differentiate the geometric structures of clusters. The overlap-separation measure using an aggregation operation of fuzzy membership degrees is better equipped to handle this drawback. For another key consideration, we need a mechanism to identify appropriate fuzzy clusters without prior knowledge on the number of clusters. From this requirement, an optimization with single criterion may not be feasible for different cluster shapes. A multi-objective genetic algorithm is therefore appropriate to search for fuzzy partitions in this situation. Apart from the overlap-separation measure, the well-known fuzzy J(m) index is also optimized through genetic operations. The algorithm simultaneously optimizes the two criteria to search for optimal clustering solutions. A string of real-coded values is encoded to represent cluster centers. A number of strings with different lengths varied over a range correspond to variable numbers of clusters. These real-coded values are optimized and the Pareto solutions corresponding to a tradeoff between the two objectives are finally produced. As shown in the experiments, the approach provides promising solutions in well-separated, hyperspherical and overlapping clusters from synthetic and real-life data sets. This is demonstrated by the comparison with existing single-objective and multi-objective clustering techniques.

  • 出版日期2014-11