A cluster centers initialization method for clustering categorical data

被引:62
|
作者
Bai, Liang [1 ,2 ]
Liang, Jiye [1 ]
Dang, Chuangyin [2 ]
Cao, Fuyuan [1 ]
机构
[1] Shanxi Univ, Sch Comp & Informat Technol, Minist Educ, Key Lab Computat Intelligence & Chinese Informat, Taiyuan 030006, Shanxi, Peoples R China
[2] City Univ Hong Kong, Dept Mfg Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
The k-modes algorithm; Initialization method; Initial cluster centers; Density; Distance; GENETIC ALGORITHM;
D O I
10.1016/j.eswa.2012.01.131
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The leading partitional clustering technique, k-modes, is one of the most computationally efficient clustering methods for categorical data. However, the performance of the k-modes clustering algorithm which converges to numerous local minima strongly depends on initial cluster centers. Currently, most methods of initialization cluster centers are mainly for numerical data. Due to lack of geometry for the categorical data, these methods used in cluster centers initialization for numerical data are not applicable to categorical data. This paper proposes a novel initialization method for categorical data which is implemented to the k-modes algorithm. The method integrates the distance and the density together to select initial cluster centers and overcomes shortcomings of the existing initialization methods for categorical data. Experimental results illustrate the proposed initialization method is effective and can be applied to large data sets for its linear time complexity with respect to the number of data objects. (C) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:8022 / 8029
页数:8
相关论文
共 50 条
  • [31] A genetic fuzzy k-Modes algorithm for clustering categorical data
    Gan, G.
    Wu, J.
    Yang, Z.
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (02) : 1615 - 1620
  • [32] Succinct Initialization Methods for Clustering Algorithms
    Liang, Xueru
    Ren, Shangkun
    Yang, Lei
    ADVANCED INTELLIGENT COMPUTING, 2011, 6838 : 47 - +
  • [33] Robust Categorical Data Clustering Guided by Multi-Granular Competitive Learning
    Cai, Shenghong
    Zhang, Yiqun
    Luo, Xiaopeng
    Cheung, Yiu-Ming
    Jia, Hong
    Li, Peng
    2024 IEEE 44TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, ICDCS 2024, 2024, : 288 - 299
  • [34] Partition-and-merge based fuzzy genetic clustering algorithm for categorical data
    Thi Phuong Quyen Nguyen
    Kuo, R. J.
    APPLIED SOFT COMPUTING, 2019, 75 : 254 - 264
  • [35] Graph Enhanced Fuzzy Clustering for Categorical Data Using a Bayesian Dissimilarity Measure
    Zhang, Chuanbin
    Chen, Long
    Zhao, Yin-Ping
    Wang, Yingxu
    Chen, C. L. Philip
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2023, 31 (03) : 810 - 824
  • [36] Hybrid Evolutionary Multiobjective Fuzzy C-Medoids Clustering of Categorical Data
    Mukhopadhyay, Anirban
    Maulik, Ujjwal
    Bandyopadhyay, Sanghamitra
    PROCEEDINGS OF THE 2013 IEEE WORKSHOP ON HYBRID INTELLIGENT MODELS AND APPLICATIONS (HIMA), 2013, : 7 - 12
  • [37] Fuzzy Centroid and Genetic Algorithms: Solutions for Numeric and Categorical Mixed Data Clustering
    Nooraeni, Rani
    Arsa, Muhamad Iqbal
    Projo, Nucke Widowati Kusumo
    5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE 2020, 2021, 179 : 677 - 684
  • [38] Clustering by exponential density analysis and find of cluster centers based on genetic algorithm
    Kun, Dong
    Ze, Wang
    Rui, Zhang
    Chao, Yin
    EIGHTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2016), 2016, 10033
  • [39] Initialization of K-modes clustering using outlier detection techniques
    Jiang, Feng
    Liu, Guozhu
    Du, Junwei
    Sui, Yuefei
    INFORMATION SCIENCES, 2016, 332 : 167 - 183
  • [40] EDMD: An Entropy based Dissimilarity measure to cluster Mixed-categorical Data
    Kar, Amit Kumar
    Akhter, Mohammad Maksood
    Mishra, Amaresh Chandra
    Mohanty, Sraban Kumar
    PATTERN RECOGNITION, 2024, 155