摘要

An automatic clustering algorithm based on particle swarm optimization, termed ATCPSO, is proposed for texting clustering in this paper. autoPSO has been exploited for evolving the correct number of clusters and simultaneously identifying clusters and has demonstrated to improve the performance of high-dimensional data clustering. To extend autoPSO to text clustering, a few modifications to the algorithm are necessary, such as the similarity measure, parameter selection and the criterion function. Our experimental results on both ten structured text datasets built from 20 newsgroups as well as four text datasets selected from CLUTO show that the proposed algorithm is able to greatly improve the quality of text clustering compared to four typical clustering algorithms and one competitive subspace clustering method.

全文