A cluster centers initialization method for clustering categorical data

被引:62
|
作者
Bai, Liang [1 ,2 ]
Liang, Jiye [1 ]
Dang, Chuangyin [2 ]
Cao, Fuyuan [1 ]
机构
[1] Shanxi Univ, Sch Comp & Informat Technol, Minist Educ, Key Lab Computat Intelligence & Chinese Informat, Taiyuan 030006, Shanxi, Peoples R China
[2] City Univ Hong Kong, Dept Mfg Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
The k-modes algorithm; Initialization method; Initial cluster centers; Density; Distance; GENETIC ALGORITHM;
D O I
10.1016/j.eswa.2012.01.131
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The leading partitional clustering technique, k-modes, is one of the most computationally efficient clustering methods for categorical data. However, the performance of the k-modes clustering algorithm which converges to numerous local minima strongly depends on initial cluster centers. Currently, most methods of initialization cluster centers are mainly for numerical data. Due to lack of geometry for the categorical data, these methods used in cluster centers initialization for numerical data are not applicable to categorical data. This paper proposes a novel initialization method for categorical data which is implemented to the k-modes algorithm. The method integrates the distance and the density together to select initial cluster centers and overcomes shortcomings of the existing initialization methods for categorical data. Experimental results illustrate the proposed initialization method is effective and can be applied to large data sets for its linear time complexity with respect to the number of data objects. (C) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:8022 / 8029
页数:8
相关论文
共 50 条
  • [21] Incremental learning based multiobjective fuzzy clustering for categorical data
    Saha, Indrajit
    Maulik, Ujjwal
    INFORMATION SCIENCES, 2014, 267 : 35 - 57
  • [22] Subspace Clustering of Categorical and Numerical Data With an Unknown Number of Clusters
    Jia, Hong
    Cheung, Yiu-Ming
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (08) : 3308 - 3325
  • [23] An efficient entropy based dissimilarity measure to cluster categorical data
    Kar, Amit Kumar
    Mishra, Amaresh Chandra
    Mohanty, Sraban Kumar
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 119
  • [24] Improved Initialization Method for Simple and Fast K-medoids Clustering
    Kim, Sung-Soo
    Kang, Bum-Su
    INDUSTRIAL ENGINEERING AND MANAGEMENT SYSTEMS, 2023, 22 (01): : 63 - 72
  • [25] A population initialization method for evolutionary algorithms based on clustering and Cauchy deviates
    Bajer, Drazen
    Martinovic, Goran
    Brest, Janez
    EXPERT SYSTEMS WITH APPLICATIONS, 2016, 60 : 294 - 310
  • [26] New Improved technique for initial cluster centers of K means clustering using Genetic Algorithm
    Bhatia, Surbhi
    2014 INTERNATIONAL CONFERENCE FOR CONVERGENCE OF TECHNOLOGY (I2CT), 2014,
  • [27] Optimization of cluster cooling performance for data centers
    Shrivastava, Saurabh K.
    VanGilder, James W.
    Sammakia, Bahgat G.
    2008 11TH IEEE INTERSOCIETY CONFERENCE ON THERMAL AND THERMOMECHANICAL PHENOMENA IN ELECTRONIC SYSTEMS, VOLS 1-3, 2008, : 1161 - +
  • [28] Categorical Data Clustering Using Harmony Search Algorithm for Healthcare Datasets
    Sharma, Abha
    Kumar, Pushpendra
    Babulal, Kanojia Sindhuben
    Obaid, Ahmed J.
    Patel, Harshita
    INTERNATIONAL JOURNAL OF E-HEALTH AND MEDICAL COMMUNICATIONS, 2022, 13 (04)
  • [29] Categorical data clustering: 25 years beyond K-modes
    Dinh, Tai
    Wong, Hauchi
    Fournier-Viger, Philippe
    Lisik, Daniil
    Ha, Minh-Quyet
    Dam, Hieu-Chi
    Huynh, Van-Nam
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 272
  • [30] Multiple factor analysis and clustering of a mixture of quantitative, categorical and frequency data
    Becue-Bertaut, Monica
    Pages, Jerome
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2008, 52 (06) : 3255 - 3268