摘要
The K-Means algorithm is possessed of several advantages such as simple conception and stable efficiency for enormous data sets. While K-Means algorithm also has several shortcomings. The selection of initial clusters, decision of cluster number, and elimination of interference of outliers are the three important subjects for improving K-Means. However, most of the proposed methods of literatures treat only one of the three subjects mentioned above. In the paper, we propose a two-phase clustering method by modifying the initialization of K-Means algorithm, which can accomplish the following jobs simultaneously: (1) deciding the proper cluster number automatically, (2) choosing the better initial clusters, and (3) reducing the influence of outliers upon the result of clustering.
- 出版日期2010-6
- 单位东华大学