摘要

Data clustering is one of the most popular techniques in data mining. It is a process of partitioning an unlabeled dataset into groups, where each group contains objects which are similar to each other with respect to a certain similarity measure and different from those of other groups. Clustering high-dimensional data is the cluster analysis of data which have anywhere from a few dozen to many thousands of dimensions. Such high-dimensional data spaces are often encountered in areas such as medicine, bioinformatics, biology, recommendation systems and the clustering of text documents. Many algorithms for large data sets have been proposed in the literature using different techniques. However, conventional algorithms have some shortcomings such as the slowness of their convergence and their sensitivity to initialization values. Particle Swarm Optimization (PSO) is a population-based globalized search algorithm that uses the principles of the social behavior of swarms. PSO produces better results in complicated and multi-peak problems. This paper presents a literature survey on the PSO algorithm and its variants to clustering high-dimensional data. An attempt is made to provide a guide for the researchers who are working in the area of PSO and high-dimensional data clustering.

  • 出版日期2015-6