摘要

Text clustering is one of the difficult and hot research fields in the internet search engine research. A new text clustering algorithm is presented based on Kmeans and Self-Organizing Model (SOM). Firstly, texts are preprocessed to satisfy succeed process requirement. Secondly, the paper improves selection of initial cluster centers and cluster seed selection methods of K-means to improve the deficiency of K-means algorithm that the Kmeans algorithm is very sensitive to the initial cluster center and the isolated point text. Thirdly the advantages of kmeans and SOM are combined to a new model to cluster text in the paper. Finally the experimental results indicate that the improved algorithm has a higher accuracy compared with the original algorithm, and has a better stability.

  • 出版日期2012