Competitive K-means

被引:32
|
作者
Esteves, Rui Maximo [1 ]
Hacker, Thomas [2 ]
Rong, Chunming [1 ]
机构
[1] Univ Stavanger, Dept Elect & Comp Engn, Stavanger, Norway
[2] Purdue Univ, Comp & Informat Technol, W Lafayette, IN 47907 USA
来源
2013 IEEE FIFTH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), VOL 1 | 2013年
关键词
K-means; K-means plus; Streaming K-means; MapReduce;
D O I
10.1109/CloudCom.2013.89
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The tremendous growth in data volumes has created a need for new tools and algorithms to quickly analyze large datasets. Cluster analysis techniques, such as K-means can be used for large datasets distributed across several machines. The accuracy of K-means depends on the selection of seed centroids during initialization. K-means++ improves on the K-means seeder, but suffers from problems when it is applied to large datasets: (a) the random algorithm it employs can produce inconsistent results across several analysis runs under the same initial conditions; and (b) it scales poorly for large datasets. In this paper we describe a new Competitive K-means algorithm we developed that addresses both of these problems. We describe an efficient MapReduce implementation of our new Competitive K-means algorithm that we found scales well with large datasets. We compared the performance of our new algorithm with three existing cluster analysis algorithms and found that our new algorithm improves cluster analysis accuracy and decreases variance. Our results show that our new algorithm produced a speedup of 76 +/- 9 times compared with the serial K-means++ and is as fast as the Streaming K-means. Our work provides a method to select a good initial seeding in less time, facilitating accurate cluster analysis over large datasets in shorter time.
引用
收藏
页码:17 / 24
页数:8
相关论文
共 50 条
  • [21] Parallel Two-Phase K-Means
    Cuong Duc Nguyen
    Dung Tien Nguyen
    Van-Hau Pham
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2013, PT V, 2013, 7975 : 224 - 231
  • [22] EFFECTIVE INITIALIZATION OF K-MEANS FOR COLOR QUANTIZATION
    Celebi, M. Emre
    2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, 2009, : 1649 - 1652
  • [23] Parallel K-Means Clustering Based on MapReduce
    Zhao, Weizhong
    Ma, Huifang
    He, Qing
    CLOUD COMPUTING, PROCEEDINGS, 2009, 5931 : 674 - 679
  • [24] Sparse probabilistic K-means
    Jung, Yoon Mo
    Whang, Joyce Jiyoung
    Yun, Sangwoon
    APPLIED MATHEMATICS AND COMPUTATION, 2020, 382
  • [25] Adaptive Graph K-Means
    Pei, Shenfei
    Sun, Yuanchen
    Nie, Feiping
    Jiang, Xudong
    Zheng, Zengwei
    PATTERN RECOGNITION, 2025, 161
  • [26] Robust trimmed k-means
    Dorabiala, Olga
    Kutz, J. Nathan
    Aravkin, Aleksandr Y.
    PATTERN RECOGNITION LETTERS, 2022, 161 : 9 - 16
  • [27] Transformed K-means Clustering
    Goel, Anurag
    Majumdar, Angshul
    29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 1526 - 1530
  • [28] Spatial Transformer K-Means
    Cosentino, Romain
    Balestriero, Randall
    Bahroun, Yanis
    Sengupta, Anirvan
    Baraniuk, Richard
    Aazhang, Behnaam
    2022 56TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2022, : 1444 - 1448
  • [29] K*-Means: An Effective and Efficient K-means Clustering Algorithm
    Qi, Jianpeng
    Yu, Yanwei
    Wang, Lihong
    Liu, Jinglei
    PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCES ON BIG DATA AND CLOUD COMPUTING (BDCLOUD 2016) SOCIAL COMPUTING AND NETWORKING (SOCIALCOM 2016) SUSTAINABLE COMPUTING AND COMMUNICATIONS (SUSTAINCOM 2016) (BDCLOUD-SOCIALCOM-SUSTAINCOM 2016), 2016, : 242 - 249
  • [30] Sparse Subspace K-means
    Diallo, Abdoul Wahab
    Niang, Ndeye
    Ouattara, Mory
    21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS ICDMW 2021, 2021, : 678 - 685