Competitive K-means

被引:32
作者
Esteves, Rui Maximo [1 ]
Hacker, Thomas [2 ]
Rong, Chunming [1 ]
机构
[1] Univ Stavanger, Dept Elect & Comp Engn, Stavanger, Norway
[2] Purdue Univ, Comp & Informat Technol, W Lafayette, IN 47907 USA
来源
2013 IEEE FIFTH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), VOL 1 | 2013年
关键词
K-means; K-means plus; Streaming K-means; MapReduce;
D O I
10.1109/CloudCom.2013.89
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The tremendous growth in data volumes has created a need for new tools and algorithms to quickly analyze large datasets. Cluster analysis techniques, such as K-means can be used for large datasets distributed across several machines. The accuracy of K-means depends on the selection of seed centroids during initialization. K-means++ improves on the K-means seeder, but suffers from problems when it is applied to large datasets: (a) the random algorithm it employs can produce inconsistent results across several analysis runs under the same initial conditions; and (b) it scales poorly for large datasets. In this paper we describe a new Competitive K-means algorithm we developed that addresses both of these problems. We describe an efficient MapReduce implementation of our new Competitive K-means algorithm that we found scales well with large datasets. We compared the performance of our new algorithm with three existing cluster analysis algorithms and found that our new algorithm improves cluster analysis accuracy and decreases variance. Our results show that our new algorithm produced a speedup of 76 +/- 9 times compared with the serial K-means++ and is as fast as the Streaming K-means. Our work provides a method to select a good initial seeding in less time, facilitating accurate cluster analysis over large datasets in shorter time.
引用
收藏
页码:17 / 24
页数:8
相关论文
共 50 条
  • [31] Constrained K-Means Classification
    Smyrlis, Panagiotis N.
    Tsouros, Dimosthenis C.
    Tsipouras, Markos G.
    ENGINEERING TECHNOLOGY & APPLIED SCIENCE RESEARCH, 2018, 8 (04) : 3203 - 3208
  • [32] Subspace K-means clustering
    Timmerman, Marieke E.
    Ceulemans, Eva
    De Roover, Kim
    Van Leeuwen, Karla
    BEHAVIOR RESEARCH METHODS, 2013, 45 (04) : 1011 - 1023
  • [33] Improving Bregman k-means
    Ashour, Wesam
    Fyfe, Colin
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2014, 6 (01) : 65 - 82
  • [34] Incremental k-Means Method
    Prasad, Rabinder Kumar
    Sarmah, Rosy
    Chakraborty, Subrata
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2019, PT I, 2019, 11941 : 38 - 46
  • [35] Comparative Study of K-Means, Pam and Rough K-Means Algorithms Using Cancer Datasets
    Kumar, Parvesh
    Wasan, Krishan
    COMPUTING, COMMUNICATION, AND CONTROL, 2011, 1 : 136 - 140
  • [36] Three-way k-means: integrating k-means and three-way decision
    Pingxin Wang
    Hong Shi
    Xibei Yang
    Jusheng Mi
    International Journal of Machine Learning and Cybernetics, 2019, 10 : 2767 - 2777
  • [37] Recombinator-k-Means: An Evolutionary Algorithm That Exploits k-Means plus plus for Recombination
    Baldassi, Carlo
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2022, 26 (05) : 991 - 1003
  • [38] Apache Mahout's k-Means vs. Fuzzy k-Means Performance Evaluation
    Xhafa, Fatos
    Bogza, Adriana
    Caballe, Santi
    Barolli, Leonard
    2016 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT NETWORKING AND COLLABORATIVE SYSTEMS (INCOS), 2016, : 110 - 116
  • [39] Global k-means plus plus : an effective relaxation of the global k-means clustering algorithm
    Vardakas, Georgios
    Likas, Aristidis
    APPLIED INTELLIGENCE, 2024, 54 (19) : 8876 - 8888
  • [40] Three-way k-means: integrating k-means and three-way decision
    Wang, Pingxin
    Shi, Hong
    Yang, Xibei
    Mi, Jusheng
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (10) : 2767 - 2777