Large-scale k-means clustering via variance reduction

被引：21

作者：

Zhao, Yawei ^{[1
]}

Ming, Yuewei ^{[1
]}

Liu, Xinwang ^{[1
]}

Zhu, En ^{[1
]}

Zhao, Kaikai ^{[2
]}

Yin, Jianping ^{[3
]}

机构：

[1] Natl Univ Def Technol, Changsha, Hunan, Peoples R China

[2] Naval Aeronaut Univ, Yantai, Shandong, Peoples R China

[3] Dongguan Univ Technol, Dongguan, Guangdong, Peoples R China

来源：

NEUROCOMPUTING | 2018年 / 307卷

基金：

中国国家自然科学基金;

关键词：

k-Means clustering; Large-scale clustering; Variance reduction;

D O I：

10.1016/j.neucom.2018.03.059

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With the increase of the volume of data such as images in web, it is challenging to perform k-means clustering on millions or even billions of images efficiently. One of the reasons is that k-means requires to use a batch of training data to update cluster centers at every iteration, which is time-consuming. Conventionally, k-means is accelerated by using one or a mini-batch of instances to update the centers, which leads to a bad performance due to the stochastic noise. In the paper, we decrease such stochastic noise, and accelerate k-means by using variance reduction technique. Specifically, we propose a position correction mechanism to correct the drift of the cluster centers, and propose a variance reduced k-means named VRKM. Furthermore, we optimize VRKM by reducing its computational cost, and propose a new variant of the variance reduced k-means named VRKM++. Comparing with VRKM, VRKM++ does not have to compute the batch gradient, and is more efficient. Extensive empirical studies show that our methods VRKM and VRKM++ outperform the state-of-the-art method, and obtain about 2 x and 4 speedups for large-scale clustering, respectively. The source code is available at https://www.github.com/YaweiZhao/VRKM.sofia-ml. (C) 2018 Elsevier B.V. All rights reserved.

引用

页码：184 / 194

页数：11

共 26 条

[1]

[Anonymous], 2013, Adv. Neural Inf. Process. Syst.

[2]

[Anonymous], 2010, P 19 INT C WORLD WID, DOI [DOI 10.1145/1772690.1772862, 10.1145/1772690.1772862]

[3]

[Anonymous], 2015, Very Deep Convolu- tional Networks for Large-Scale Image Recognition

[4]

[Anonymous], P INT C MACH LEARN

[5]

[Anonymous], 2002, P 8 ACM SIGKDD INT C, DOI DOI 10.1145/775047.775138

[6]

[Anonymous], P AAAI C ART INT

[7]

[Anonymous], 2009, Learning multiple layers of features from tiny images

[8]

[Anonymous], 2007, CVPR

[9]

[Anonymous], 2016, ARXIV160604838

[10]

[Anonymous], P INT JOINT C ART IN

← 1 2 3 →