An efficient Particle Swarm Optimization approach to cluster short texts

作者:Cagnina Leticia*; Errecalde Marcelo; Ingaramo Diego; Rosso Paolo
来源:Information Sciences, 2014, 265: 36-49.
DOI:10.1016/j.ins.2013.12.010

摘要

Short texts such as evaluations of commercial products, news, FAQ%26apos;s and scientific abstracts are important resources on the Web due to the constant requirements of people to use this on line information in real life. In this context, the clustering of short texts is a significant analysis task and a discrete Particle Swarm Optimization (PSO) algorithm named CLUDIPSO has recently shown a promising performance in this type of problems. CLUDIPSO obtained high quality results with small corpora although, with larger corpora, a significant deterioration of performance was observed. This article presents CLUDIPSO*, an improved version of CLUDIPSO, which includes a different representation of particles, a more efficient evaluation of the function to be optimized and some modifications in the mutation operator. Experimental results with corpora containing scientific abstracts, news and short legal documents obtained from the Web, show that CLUDIPSO* is an effective clustering method for short-text corpora of small and medium size.

  • 出版日期2014-5-1