A cluster centers initialization method for clustering categorical data

被引:62
|
作者
Bai, Liang [1 ,2 ]
Liang, Jiye [1 ]
Dang, Chuangyin [2 ]
Cao, Fuyuan [1 ]
机构
[1] Shanxi Univ, Sch Comp & Informat Technol, Minist Educ, Key Lab Computat Intelligence & Chinese Informat, Taiyuan 030006, Shanxi, Peoples R China
[2] City Univ Hong Kong, Dept Mfg Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
The k-modes algorithm; Initialization method; Initial cluster centers; Density; Distance; GENETIC ALGORITHM;
D O I
10.1016/j.eswa.2012.01.131
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The leading partitional clustering technique, k-modes, is one of the most computationally efficient clustering methods for categorical data. However, the performance of the k-modes clustering algorithm which converges to numerous local minima strongly depends on initial cluster centers. Currently, most methods of initialization cluster centers are mainly for numerical data. Due to lack of geometry for the categorical data, these methods used in cluster centers initialization for numerical data are not applicable to categorical data. This paper proposes a novel initialization method for categorical data which is implemented to the k-modes algorithm. The method integrates the distance and the density together to select initial cluster centers and overcomes shortcomings of the existing initialization methods for categorical data. Experimental results illustrate the proposed initialization method is effective and can be applied to large data sets for its linear time complexity with respect to the number of data objects. (C) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:8022 / 8029
页数:8
相关论文
共 50 条
  • [41] G-ANMI: A mutual information based genetic clustering algorithm for categorical data
    Deng, Shengchun
    He, Zengyou
    Xu, Xiaofei
    KNOWLEDGE-BASED SYSTEMS, 2010, 23 (02) : 144 - 149
  • [42] An Adaptive Initial Cluster Centers Selection Algorithm for High-dimensional Partition Clustering
    Gao, Zhipeng
    Fan, Yidan
    2017 IEEE 15TH INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, 15TH INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, 3RD INTL CONF ON BIG DATA INTELLIGENCE AND COMPUTING AND CYBER SCIENCE AND TECHNOLOGY CONGRESS(DASC/PICOM/DATACOM/CYBERSCI, 2017, : 1119 - 1126
  • [43] Semi-supervised K-Means Clustering by Optimizing Initial Cluster Centers
    Wang, Xin
    Wang, Chaofei
    Shen, Junyi
    WEB INFORMATION SYSTEMS AND MINING, PT II, 2011, 6988 : 178 - +
  • [44] An Improved K-means text clustering algorithm By Optimizing initial cluster centers
    Xiong, Caiquan
    Hua, Zhen
    Lv, Ke
    Li, Xuan
    2016 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA (CCBD), 2016, : 265 - 268
  • [45] Min-max kurtosis mean distance based k-means initial centroid initialization method for big genomic data clustering
    Pandey, Kamlesh Kumar
    Shukla, Diwakar
    EVOLUTIONARY INTELLIGENCE, 2023, 16 (03) : 1055 - 1076
  • [46] Density-based clustering algorithm for numerical and categorical data with mixed distance measure methods
    Chen, Jin-Yin
    He, Hui-Hao
    Kongzhi Lilun Yu Yingyong/Control Theory and Applications, 2015, 32 (08): : 993 - 1002
  • [47] A GA-based clustering algorithm for large data sets with mixed numeric and categorical values
    Li, J
    Gao, XB
    Jiao, LC
    THIRD INTERNATIONAL SYMPOSIUM ON MULTISPECTRAL IMAGE PROCESSING AND PATTERN RECOGNITION, PTS 1 AND 2, 2003, 5286 : 171 - 174
  • [48] Metaheuristic-based possibilistic fuzzy k-modes algorithms for categorical data clustering
    Kuo, R. J.
    Zheng, Y. R.
    Thi Phuong Quyen Nguyen
    INFORMATION SCIENCES, 2021, 557 : 1 - 15
  • [49] Clustering with density based initialization and Bhattacharyya based merging
    Kose, Erdem
    Hocaoglu, Ali Koksal
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2022, 30 (03) : 502 - 517
  • [50] A Novel K-Means Clustering Method for Locating Urban Hotspots Based on Hybrid Heuristic Initialization
    Li, Yiping
    Zhou, Xiangbing
    Gu, Jiangang
    Guo, Ke
    Deng, Wu
    APPLIED SCIENCES-BASEL, 2022, 12 (16):