Large-scale k-means clustering via variance reduction

Zhao, Yawei<sup>*</sup>; Ming, Yuewei; Liu, Xinwang; Zhu, En; Zhao, Kaikai; Yin, Jianping

doi:10.1016/j.neucom.2018.03.059

摘要

With the increase of the volume of data such as images in web, it is challenging to perform k-means clustering on millions or even billions of images efficiently. One of the reasons is that k-means requires to use a batch of training data to update cluster centers at every iteration, which is time-consuming. Conventionally, k-means is accelerated by using one or a mini-batch of instances to update the centers, which leads to a bad performance due to the stochastic noise. In the paper, we decrease such stochastic noise, and accelerate k-means by using variance reduction technique. Specifically, we propose a position correction mechanism to correct the drift of the cluster centers, and propose a variance reduced k-means named VRKM. Furthermore, we optimize VRKM by reducing its computational cost, and propose a new variant of the variance reduced k-means named VRKM++. Comparing with VRKM, VRKM++ does not have to compute the batch gradient, and is more efficient. Extensive empirical studies show that our methods VRKM and VRKM++ outperform the state-of-the-art method, and obtain about 2 x and 4 speedups for large-scale clustering, respectively. The source code is available at https://www.github.com/YaweiZhao/VRKM.sofia-ml.

出版日期2018-9-13
单位东莞理工学院; 中国人民解放军国防科学技术大学

全文

访问全文

收藏分享被引(16) 浏览

更新时间：2024-05-10 21:39

Large-scale k-means clustering via variance reduction

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友