An Initialization Method Based on Hybrid Distance for k-Means Algorithm

被引:11
作者
Yang, Jie [1 ]
Ma, Yan [1 ]
Zhang, Xiangfen [1 ]
Li, Shunbao [2 ]
Zhang, Yuping [1 ]
机构
[1] Shanghai Normal Univ, Coll Informat Mech & Elect Engn, Shanghai 200234, Peoples R China
[2] Shanghai Normal Univ, Coll Math & Sci, Shanghai 200234, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1162/neco_a_01014
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The traditional k-means algorithm has been widely used as a simple and efficient clustering method. However, the performance of this algorithm is highly dependent on the selection of initial cluster centers. Therefore, the method adopted for choosing initial cluster centers is extremely important. In this letter, we redefine the density of points according to the number of its neighbors, as well as the distance between points and their neighbors. In addition, we define a new distance measure that considers both Euclidean distance and density. Based on that, we propose an algorithm for selecting initial cluster centers that can dynamically adjust the weighting parameter. Furthermore, we propose a new internal clustering validation measure, the clustering validation index based on the neighbors (CVN), which can be exploited to select the optimal result among multiple clustering results. Experimental results show that the proposed algorithm outperforms existing initialization methods on real-world data sets and demonstrates the adaptability of the proposed algorithm to data sets with various characteristics.
引用
收藏
页码:3094 / 3117
页数:24
相关论文
共 50 条
  • [41] Automatic centroid initialization in k-means using artificial hummingbird algorithm
    Kusum Preeti
    undefined Deep
    Neural Computing and Applications, 2025, 37 (5) : 3373 - 3398
  • [42] An entropy-based initialization method of K-means clustering on the optimal number of clusters
    Chowdhury, Kuntal
    Chaudhuri, Debasis
    Pal, Arup Kumar
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (12) : 6965 - 6982
  • [43] Min–max kurtosis mean distance based k-means initial centroid initialization method for big genomic data clustering
    Kamlesh Kumar Pandey
    Diwakar Shukla
    Evolutionary Intelligence, 2023, 16 : 1055 - 1076
  • [44] An entropy-based initialization method of K-means clustering on the optimal number of clusters
    Kuntal Chowdhury
    Debasis Chaudhuri
    Arup Kumar Pal
    Neural Computing and Applications, 2021, 33 : 6965 - 6982
  • [45] Motif-Based Method for Initialization the K-Means Clustering for Time Series Data
    Le Phu
    Duong Tuan Anh
    AI 2011: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2011, 7106 : 11 - 20
  • [46] A Near-Optimal Centroids Initialization in K-means Algorithm Using Bees Algorithm
    Mahmuddin, M.
    Yusof, Y.
    COMPUTING & INFORMATICS, 2009, : 172 - 175
  • [47] K-means Algorithm Based on Quasi Ideal Point Method
    Liu, Hui-Ming
    Bai, Jie
    Gan, Chen-Ming
    2017 CHINESE AUTOMATION CONGRESS (CAC), 2017, : 2900 - 2903
  • [48] An Improved k-means Algorithm based on Average Diameter Method
    Zhao, Yang
    Zeng, Bi
    GREEN ENERGY AND SUSTAINABLE DEVELOPMENT I, 2017, 1864
  • [49] EM Algorithm with Initialization Based on Incremental k-means for GMM and Its Application to Speaker Identification
    Lee, Younjeong
    Seo, Changwoo
    Hahn, Hernsoo
    Lee, Kiyong
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2005, 24 (03): : 141 - 149
  • [50] Genetic TKM: A Hybrid Clustering Method Based on Genetic Algorithm, Tabu Search and K-Means
    Yaghini, Masoud
    Gereilinia, Nasim
    INTERNATIONAL JOURNAL OF APPLIED METAHEURISTIC COMPUTING, 2013, 4 (01) : 67 - 77