Automatic centroid initialization in k-means using artificial hummingbird algorithm

被引:0
作者
Kusum Preeti [1 ]
undefined Deep [2 ]
机构
[1] Department of Mathematics, Indian Institute of Technology Roorkee, Uttarakhand, Roorkee
[2] The University of Tennessee Health Science Centre, Memphis, 38163, TN
关键词
Clustering analysis; Data clustering; K-means; Nature inspired algorithm;
D O I
10.1007/s00521-024-10764-4
中图分类号
学科分类号
摘要
K-means is a widely used technique that heavily relies on the initial cluster centroid location. Poorly chosen centroids can cause the algorithm to get trapped in suboptimal solutions. Additionally, determining the optimal number of clusters for large datasets is computationally expensive. To address these challenges, a recently developed Artificial Hummingbird Algorithm (AHA) is used to initialize cluster centroid locations and automatically determine the best estimate for the number of clusters. AHA simulates the specialized flight skills and intelligent foraging strategies of hummingbirds, striking a fine balance between exploration and exploitation during the search process. Unlike other data clustering approaches that use a fixed threshold in heuristic methods, we propose a dynamic threshold based on the variance of the data with respect to its centroids for activating cluster centroids in AHA. The data are automatically partitioned into k cluster centroids such that cohesion, measured by cluster diameters, and separation, measured by nearest neighbor distance, are optimized. The algorithm is tested on various datasets, including real-world data, fundamental clustering benchmarks, synthetic data, and high-dimensional data. To evaluate performance, metrics such as fitness value, inter-cluster distance, and intra-cluster distance were used. Results indicate that the proposed method ranked first and achieved superior clustering performance compared to state-of-the-art algorithms. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024.
引用
收藏
页码:3373 / 3398
页数:25
相关论文
共 50 条
  • [31] How much can k-means be improved by using better initialization and repeats?
    Franti, Pasi
    Sieranoja, Sami
    PATTERN RECOGNITION, 2019, 93 : 95 - 112
  • [32] K-means algorithm with a novel distance measure
    Abudalfa, Shadi I.
    Mikki, Mohammad
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2013, 21 (06) : 1665 - 1684
  • [33] Improved artificial bee colony clustering algorithm based on K-means
    Wang Xuemei
    Wang Jin-bo
    MECHATRONICS ENGINEERING, COMPUTING AND INFORMATION TECHNOLOGY, 2014, 556-562 : 3852 - +
  • [34] PERFORMANCE ANALYSIS OF COMBINED METHODS OF GENETIC ALGORITHM AND K-MEANS CLUSTERING IN DETERMINING THE VALUE OF CENTROID
    Putra, Adya Zizwan
    Zarlis, Muhammad
    Nababan, Erna Budhiarti
    INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICONICT), 2017, 930
  • [35] Enhancing the K-means Algorithm Using Cluster Adjustment
    Yamout, Fadi
    2023 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE, CSCI 2023, 2023, : 307 - 311
  • [36] Accelerating the Yinyang K-Means Algorithm Using the GPU
    Taylor, Colin
    Gowanlock, Michael
    2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 1835 - 1840
  • [37] Performance Enhancement of K-Means clustering algorithm for gene expression data using entropy-based centroid selection
    Trivedi, Naveen
    Kanungo, Suvendu
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND AUTOMATION (ICCCA), 2017, : 143 - 148
  • [38] Mixed clustering algorithm with artificial fish swarm and improved K-means
    Yang, Hao
    PROCEEDINGS OF THE 2ND INFORMATION TECHNOLOGY AND MECHATRONICS ENGINEERING CONFERENCE (ITOEC 2016), 2016, 24 : 226 - 229
  • [39] Optimization of K-Means clustering Using Genetic Algorithm
    Irfan, Shadab
    Dwivedi, Gaurav
    Ghosh, Subhajit
    2017 INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES FOR SMART NATION (IC3TSN), 2017, : 157 - 162
  • [40] Min–max kurtosis mean distance based k-means initial centroid initialization method for big genomic data clustering
    Kamlesh Kumar Pandey
    Diwakar Shukla
    Evolutionary Intelligence, 2023, 16 : 1055 - 1076