The k-means Algorithm: A Comprehensive Survey and Performance Evaluation

被引:700
作者
Ahmed, Mohiuddin [1 ]
Seraj, Raihan [2 ]
Islam, Syed Mohammed Shamsul [1 ,3 ]
机构
[1] Edith Cowan Univ, Sch Sci, Joondalup 6027, Australia
[2] McGill Univ, Dept Elect & Comp Engn, Montreal, PQ H3A 0G4, Canada
[3] Univ Western Australia, Sch Comp Sci & Software Engn, Crawley 6009, Australia
关键词
clustering; k-means; initialization; categorical attributes; cyber security; healthcare; unsupervised learning; CLUSTERING-ALGORITHM; KERNEL; SELECTION; SPARSE;
D O I
10.3390/electronics9081295
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The k-means clustering algorithm is considered one of the most powerful and popular data mining algorithms in the research community. However, despite its popularity, the algorithm has certain limitations, including problems associated with random initialization of the centroids which leads to unexpected convergence. Additionally, such a clustering algorithm requires the number of clusters to be defined beforehand, which is responsible for different cluster shapes and outlier effects. A fundamental problem of thek-means algorithm is its inability to handle various data types. This paper provides a structured and synoptic overview of research conducted on the k-means algorithm to overcome such shortcomings. Variants of thek-means algorithms including their recent developments are discussed, where their effectiveness is investigated based on the experimental analysis of a variety of datasets. The detailed experimental analysis along with a thorough comparison among differentk-meansclustering algorithms differentiates our work compared to other existing survey papers. Furthermore, it outlines a clear and thorough understanding of thek-meansalgorithm along with its different research directions.
引用
收藏
页码:1 / 12
页数:12
相关论文
共 76 条
[1]   Coordinate Rotation-Based Low Complexity K-Means Clustering Architecture [J].
Adapa, Bhagyaraja ;
Biswas, Dwaipayan ;
Bhardwaj, Swati ;
Raghuraman, Shashank ;
Acharyya, Amit ;
Maharatna, Koushik .
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2017, 25 (04) :1568-1572
[2]   A k-mean clustering algorithm for mixed numeric and categorical data [J].
Ahmad, Amir ;
Dey, Lipika .
DATA & KNOWLEDGE ENGINEERING, 2007, 63 (02) :503-527
[3]  
Ahmed Mohiuddin, 2017, 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), P998, DOI 10.1145/3110025.3119402
[4]  
Ahmed M., 2018, Ann. Data Sci., V5, P497, DOI [10.1007/S40745-018-0149-0/FIGURES/10, DOI 10.1007/S40745-018-0149-0]
[5]  
Ahmed M., 2016, NETWORK TRAFFIC DATA
[6]  
Ahmed M., 2004, P 9 IEEE INT C IND E, P1141
[7]   Data summarization: a survey [J].
Ahmed, Mohiuddin .
KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 58 (02) :249-273
[8]   Infrequent pattern mining in smart healthcare environment using data summarization [J].
Ahmed, Mohiuddin ;
Ullah, Abu S. S. M. Barkat .
JOURNAL OF SUPERCOMPUTING, 2018, 74 (10) :5041-5059
[9]   An Unsupervised Approach of Knowledge Discovery from Big Data in Social Network [J].
Ahmed, Mohiuddin .
EAI Endorsed Transactions on Scalable Information Systems, 2017, 4 (14) :1-6
[10]   A survey of anomaly detection techniques in financial domain [J].
Ahmed, Mohiuddin ;
Mahmood, Abdun Naser ;
Islam, Md. Rafiqul .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2016, 55 :278-288