Automatic centroid initialization in k-means using artificial hummingbird algorithm

被引:0
作者
Kusum Preeti [1 ]
undefined Deep [2 ]
机构
[1] Department of Mathematics, Indian Institute of Technology Roorkee, Uttarakhand, Roorkee
[2] The University of Tennessee Health Science Centre, Memphis, 38163, TN
关键词
Clustering analysis; Data clustering; K-means; Nature inspired algorithm;
D O I
10.1007/s00521-024-10764-4
中图分类号
学科分类号
摘要
K-means is a widely used technique that heavily relies on the initial cluster centroid location. Poorly chosen centroids can cause the algorithm to get trapped in suboptimal solutions. Additionally, determining the optimal number of clusters for large datasets is computationally expensive. To address these challenges, a recently developed Artificial Hummingbird Algorithm (AHA) is used to initialize cluster centroid locations and automatically determine the best estimate for the number of clusters. AHA simulates the specialized flight skills and intelligent foraging strategies of hummingbirds, striking a fine balance between exploration and exploitation during the search process. Unlike other data clustering approaches that use a fixed threshold in heuristic methods, we propose a dynamic threshold based on the variance of the data with respect to its centroids for activating cluster centroids in AHA. The data are automatically partitioned into k cluster centroids such that cohesion, measured by cluster diameters, and separation, measured by nearest neighbor distance, are optimized. The algorithm is tested on various datasets, including real-world data, fundamental clustering benchmarks, synthetic data, and high-dimensional data. To evaluate performance, metrics such as fitness value, inter-cluster distance, and intra-cluster distance were used. Results indicate that the proposed method ranked first and achieved superior clustering performance compared to state-of-the-art algorithms. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024.
引用
收藏
页码:3373 / 3398
页数:25
相关论文
共 50 条
  • [41] In Search of a New Initialization of K-Means Clustering for Color Quantization
    Frackiewicz, Mariusz
    Palus, Henryk
    EIGHTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2015), 2015, 9875
  • [42] Statistical initialization of intrinsic K-means clustering on homogeneous manifolds
    Tan, Chao
    Zhao, Huan
    Ding, Han
    APPLIED INTELLIGENCE, 2023, 53 (05) : 4959 - 4978
  • [43] K-Means Initialization Methods for Improving Clustering by Simulated Annealing
    Perim, Gabriela Trazzi
    Wandekokem, Estefhan Dazzi
    Varejao, Flavio Miguel
    ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2008, PROCEEDINGS, 2008, 5290 : 133 - 142
  • [44] AN EFFICIENT INITIALIZATION METHOD FOR K-MEANS CLUSTERING OF HYPERSPECTRAL DATA
    Naeini, A. Alizade
    Jamshidzadeh, A.
    Saadatseresht, M.
    Homayouni, S.
    1ST ISPRS INTERNATIONAL CONFERENCE ON GEOSPATIAL INFORMATION RESEARCH, 2014, 40 (2/W3): : 35 - 39
  • [45] Davies Bouldin Index based hierarchical initialization K-means
    Xiao, Junwei
    Lu, Jianfeng
    Li, Xiangyu
    INTELLIGENT DATA ANALYSIS, 2017, 21 (06) : 1327 - 1338
  • [46] k*-means:: A new generalized k-means clustering algorithm
    Cheung, YM
    PATTERN RECOGNITION LETTERS, 2003, 24 (15) : 2883 - 2893
  • [47] K*-Means: An Effective and Efficient K-means Clustering Algorithm
    Qi, Jianpeng
    Yu, Yanwei
    Wang, Lihong
    Liu, Jinglei
    PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCES ON BIG DATA AND CLOUD COMPUTING (BDCLOUD 2016) SOCIAL COMPUTING AND NETWORKING (SOCIALCOM 2016) SUSTAINABLE COMPUTING AND COMMUNICATIONS (SUSTAINCOM 2016) (BDCLOUD-SOCIALCOM-SUSTAINCOM 2016), 2016, : 242 - 249
  • [48] Cluster Analysis using A Gradient Evolution-based K-means Algorithm
    Kuo, R. J.
    Zulvia, Ferani E.
    2016 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2016, : 5138 - 5145
  • [49] Automatic Text Summarization Using Gensim Word2Vec and K-Means Clustering Algorithm
    Haider, Mofiz Mojib
    Hossin, Md Arman
    Mahi, Hasibur Rashid
    Arif, Hossain
    2020 IEEE REGION 10 SYMPOSIUM (TENSYMP) - TECHNOLOGY FOR IMPACTFUL SUSTAINABLE DEVELOPMENT, 2020, : 283 - 286
  • [50] Data Clustering with Cluster Size Constraints Using a Modified k-means Algorithm
    Ganganath, Nuwan
    Cheng, Chi-Tsun
    Tse, Chi K.
    2014 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY (CYBERC), 2014, : 158 - 161