A hybrid approach to speed-up the k-means clustering method

作者:Sarma T Hitendra*; Viswanath P; Reddy B Eswara
来源:International Journal of Machine Learning and Cybernetics, 2013, 4(2): 107-117.
DOI:10.1007/s13042-012-0079-7

摘要

k-means clustering method is an iterative partition-based method which for finite data-sets converges to a solution in a finite time. The running time of this method grows linearly with respect to the size of the data-set. Many variants have been proposed to speed-up the conventional k-means clustering method. In this paper, we propose a prototype-based hybrid approach to speed-up the k-means clustering method. The proposed method, first partitions the data-set into small clusters (grouplets), which are of varying sizes. Each grouplet is represented by a prototype. Later, the set of prototypes is partitioned into k clusters using the modified k-means method. The modified k-means clustering method is similar to the conventional k-means method but it avoids empty clusters (the clusters to which no pattern is assigned) in the iterative process. In each cluster of prototypes, each prototype is replaced by its corresponding set of patterns (which formed the grouplet) to derive a partition of the data-set. Since this partition of the data-set can deviate from the partition obtained using the conventional k-means method over the entire data-set, a correcting step is proposed. Both theoretically and experimentally, the conventional k-means method and the proposed hybrid method (augmented with the correcting step) are shown to yield the same result (provided, the initial k seed points are same). But, the proposed method is much faster than the conventional one. Experimentally, the proposed method is compared with the conventional method and the other recent methods that are proposed to speed-up the k-means method.

  • 出版日期2013-4