A cluster centers initialization method for clustering categorical data

被引:62
|
作者
Bai, Liang [1 ,2 ]
Liang, Jiye [1 ]
Dang, Chuangyin [2 ]
Cao, Fuyuan [1 ]
机构
[1] Shanxi Univ, Sch Comp & Informat Technol, Minist Educ, Key Lab Computat Intelligence & Chinese Informat, Taiyuan 030006, Shanxi, Peoples R China
[2] City Univ Hong Kong, Dept Mfg Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
The k-modes algorithm; Initialization method; Initial cluster centers; Density; Distance; GENETIC ALGORITHM;
D O I
10.1016/j.eswa.2012.01.131
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The leading partitional clustering technique, k-modes, is one of the most computationally efficient clustering methods for categorical data. However, the performance of the k-modes clustering algorithm which converges to numerous local minima strongly depends on initial cluster centers. Currently, most methods of initialization cluster centers are mainly for numerical data. Due to lack of geometry for the categorical data, these methods used in cluster centers initialization for numerical data are not applicable to categorical data. This paper proposes a novel initialization method for categorical data which is implemented to the k-modes algorithm. The method integrates the distance and the density together to select initial cluster centers and overcomes shortcomings of the existing initialization methods for categorical data. Experimental results illustrate the proposed initialization method is effective and can be applied to large data sets for its linear time complexity with respect to the number of data objects. (C) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:8022 / 8029
页数:8
相关论文
共 50 条
  • [11] A Novel Cluster Prediction Approach Based on Locality-Sensitive Hashing for Fuzzy Clustering of Categorical Data
    Toan Nguyen Mau
    Inoguchi, Yasushi
    Van-Nam Huynh
    IEEE ACCESS, 2022, 10 : 34196 - 34206
  • [12] A Novel Initialization Method for Semi-supervised Clustering
    Dang, Yanzhong
    Xuan, Zhaoguo
    Rong, Lili
    Liu, Ming
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, 2010, 6291 : 317 - 328
  • [13] Integrated Rough Fuzzy Clustering for Categorical data Analysis
    Saha, Indrajit
    Sarkar, Jnanendra Prasad
    Maulik, Ujjwal
    FUZZY SETS AND SYSTEMS, 2019, 361 : 1 - 32
  • [14] Integrating Clustering and Supervised Learning for Categorical Data Analysis
    Maulik, Ujjwal
    Bandyopadhyay, Sanghamitra
    Saha, Indrajit
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2010, 40 (04): : 664 - 675
  • [15] Combining Fuzzy Clustering with ANN Classifier for Categorical Data
    Saha, Indrajit
    Mukhopadhyay, Anirban
    Maulik, Ujjwal
    2009 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE, VOLS 1-3, 2009, : 44 - 49
  • [16] Model-Based Hierarchical Clustering for Categorical Data
    Alalyan, Fahdah
    Zamzami, Nuha
    Bouguila, Nizar
    2019 IEEE 28TH INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS (ISIE), 2019, : 1424 - 1429
  • [17] A Mutual Information Based on Ant Colony Optimization Method to Feature Selection for Categorical Data Clustering
    Shojaee, Z.
    Fazeli, S. A. Shahzadeh
    Abbasi, E.
    Adibnia, F.
    Masuli, F.
    Rovetta, S.
    IRANIAN JOURNAL OF SCIENCE, 2023, 47 (01) : 175 - 186
  • [18] Initialization method of genetic algorithm based on improved clustering algorithm
    Li, Hao
    Jiang, Xuesong
    Wei, Xiumei
    PROCEEDINGS OF THE 2022 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2022, 2022, : 447 - 450
  • [19] Consensus Multiobjective Differential Crisp Clustering for Categorical Data Analysis
    Saha, Indrajit
    Plewczynski, Dariusz
    Maulik, Ujjwal
    Bandyopadhyay, Sanghamitra
    ROUGH SETS AND CURRENT TRENDS IN COMPUTING, PROCEEDINGS, 2010, 6086 : 30 - +
  • [20] A Genetic Algorithm Based Ensemble Approach for Categorical Data Clustering
    Goswami, Jyoti Prokash
    Mahanta, Anjana Kakoti
    2015 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2015,