Competitive K-means

被引:32
|
作者
Esteves, Rui Maximo [1 ]
Hacker, Thomas [2 ]
Rong, Chunming [1 ]
机构
[1] Univ Stavanger, Dept Elect & Comp Engn, Stavanger, Norway
[2] Purdue Univ, Comp & Informat Technol, W Lafayette, IN 47907 USA
来源
2013 IEEE FIFTH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), VOL 1 | 2013年
关键词
K-means; K-means plus; Streaming K-means; MapReduce;
D O I
10.1109/CloudCom.2013.89
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The tremendous growth in data volumes has created a need for new tools and algorithms to quickly analyze large datasets. Cluster analysis techniques, such as K-means can be used for large datasets distributed across several machines. The accuracy of K-means depends on the selection of seed centroids during initialization. K-means++ improves on the K-means seeder, but suffers from problems when it is applied to large datasets: (a) the random algorithm it employs can produce inconsistent results across several analysis runs under the same initial conditions; and (b) it scales poorly for large datasets. In this paper we describe a new Competitive K-means algorithm we developed that addresses both of these problems. We describe an efficient MapReduce implementation of our new Competitive K-means algorithm that we found scales well with large datasets. We compared the performance of our new algorithm with three existing cluster analysis algorithms and found that our new algorithm improves cluster analysis accuracy and decreases variance. Our results show that our new algorithm produced a speedup of 76 +/- 9 times compared with the serial K-means++ and is as fast as the Streaming K-means. Our work provides a method to select a good initial seeding in less time, facilitating accurate cluster analysis over large datasets in shorter time.
引用
收藏
页码:17 / 24
页数:8
相关论文
共 50 条
  • [2] Comparison of K-means and K-means plus plus for image compression with thermographic images
    Biswas, Hridoy
    Umbaugh, Scott E.
    Marino, Dominic
    Sackman, Joseph
    THERMOSENSE: THERMAL INFRARED APPLICATIONS XLIII, 2021, 11743
  • [3] Vectorized Implementation of K-means
    Otsuka, Tomoki
    Fukushima, Norishige
    INTERNATIONAL WORKSHOP ON ADVANCED IMAGING TECHNOLOGY (IWAIT) 2021, 2021, 11766
  • [4] Efficient k-Means plus plus Approximation with MapReduce
    Xu, Yujie
    Qu, Wenyu
    Li, Zhiyang
    Min, Geyong
    Li, Keqiu
    Liu, Zhaobin
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2014, 25 (12) : 3135 - 3144
  • [5] Deep k-Means: Jointly clustering with k-Means and learning representations
    Fard, Maziar Moradi
    Thonet, Thibaut
    Gaussier, Eric
    PATTERN RECOGNITION LETTERS, 2020, 138 : 185 - 192
  • [6] Density K-means : A New Algorithm for Centers Initialization for K-means
    Lan, Xv
    Li, Qian
    Zheng, Yi
    PROCEEDINGS OF 2015 6TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE, 2015, : 958 - 961
  • [7] PSO Aided k-Means Clustering: Introducing Connectivity in k-Means
    Breaban, Mihaela Elena
    Luchian, Henri
    GECCO-2011: PROCEEDINGS OF THE 13TH ANNUAL GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2011, : 1227 - 1234
  • [8] A bad instance for k-means plus
    Brunsch, Tobias
    Roeglin, Heiko
    THEORETICAL COMPUTER SCIENCE, 2013, 505 : 19 - 26
  • [9] COMPRESSIVE K-MEANS
    Keriven, Nicolas
    Tremblay, Nicolas
    Traonmilin, Yann
    Gribonval, Remi
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 6369 - 6373
  • [10] k-means: A revisit
    Zhao, Wan-Lei
    Deng, Cheng-Hao
    Ngo, Chong-Wah
    NEUROCOMPUTING, 2018, 291 : 195 - 206