Automatic centroid initialization in k-means using artificial hummingbird algorithm

被引：0

作者：

Kusum Preeti ^{[1
]}

undefined Deep ^{[2
]}

机构：

[1] Department of Mathematics, Indian Institute of Technology Roorkee, Uttarakhand, Roorkee

[2] The University of Tennessee Health Science Centre, Memphis, 38163, TN

来源：

Neural Computing and Applications | 2025年 / 37卷 / 5期

关键词：

Clustering analysis; Data clustering; K-means; Nature inspired algorithm;

D O I：

10.1007/s00521-024-10764-4

中图分类号：

学科分类号：

摘要：

K-means is a widely used technique that heavily relies on the initial cluster centroid location. Poorly chosen centroids can cause the algorithm to get trapped in suboptimal solutions. Additionally, determining the optimal number of clusters for large datasets is computationally expensive. To address these challenges, a recently developed Artificial Hummingbird Algorithm (AHA) is used to initialize cluster centroid locations and automatically determine the best estimate for the number of clusters. AHA simulates the specialized flight skills and intelligent foraging strategies of hummingbirds, striking a fine balance between exploration and exploitation during the search process. Unlike other data clustering approaches that use a fixed threshold in heuristic methods, we propose a dynamic threshold based on the variance of the data with respect to its centroids for activating cluster centroids in AHA. The data are automatically partitioned into k cluster centroids such that cohesion, measured by cluster diameters, and separation, measured by nearest neighbor distance, are optimized. The algorithm is tested on various datasets, including real-world data, fundamental clustering benchmarks, synthetic data, and high-dimensional data. To evaluate performance, metrics such as fitness value, inter-cluster distance, and intra-cluster distance were used. Results indicate that the proposed method ranked first and achieved superior clustering performance compared to state-of-the-art algorithms. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024.

引用

页码：3373 / 3398

页数：25

共 50 条

[31] How much can k-means be improved by using better initialization and repeats?
Franti, Pasi
Sieranoja, Sami
PATTERN RECOGNITION, 2019, 93 : 95 - 112
[32] K-means algorithm with a novel distance measure
Abudalfa, Shadi I.
Mikki, Mohammad
TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2013, 21 (06) : 1665 - 1684
[33] Improved artificial bee colony clustering algorithm based on K-means
Wang Xuemei
Wang Jin-bo
MECHATRONICS ENGINEERING, COMPUTING AND INFORMATION TECHNOLOGY, 2014, 556-562 : 3852 - +
[34] PERFORMANCE ANALYSIS OF COMBINED METHODS OF GENETIC ALGORITHM AND K-MEANS CLUSTERING IN DETERMINING THE VALUE OF CENTROID
Putra, Adya Zizwan
Zarlis, Muhammad
Nababan, Erna Budhiarti
INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICONICT), 2017, 930
[35] Enhancing the K-means Algorithm Using Cluster Adjustment
Yamout, Fadi
2023 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE, CSCI 2023, 2023, : 307 - 311
[36] Accelerating the Yinyang K-Means Algorithm Using the GPU
Taylor, Colin
Gowanlock, Michael
2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 1835 - 1840
[37] Performance Enhancement of K-Means clustering algorithm for gene expression data using entropy-based centroid selection
Trivedi, Naveen
Kanungo, Suvendu
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND AUTOMATION (ICCCA), 2017, : 143 - 148
[38] Mixed clustering algorithm with artificial fish swarm and improved K-means
Yang, Hao
PROCEEDINGS OF THE 2ND INFORMATION TECHNOLOGY AND MECHATRONICS ENGINEERING CONFERENCE (ITOEC 2016), 2016, 24 : 226 - 229
[39] Optimization of K-Means clustering Using Genetic Algorithm
Irfan, Shadab
Dwivedi, Gaurav
Ghosh, Subhajit
2017 INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES FOR SMART NATION (IC3TSN), 2017, : 157 - 162
[40] Min–max kurtosis mean distance based k-means initial centroid initialization method for big genomic data clustering
Kamlesh Kumar Pandey
Diwakar Shukla
Evolutionary Intelligence, 2023, 16 : 1055 - 1076

← 1 2 3 4 5 →