Categorical Data Clustering with Automatic Selection of Cluster Number

被引:9
|
作者
Liao, Hai-Yong [1 ,2 ]
Ng, Michael K. [1 ,2 ]
机构
[1] Hong Kong Baptist Univ, Ctr Math Imaging & Vis, Kowloon Tong, Hong Kong, Peoples R China
[2] Hong Kong Baptist Univ, Dept Math, Kowloon Tong, Hong Kong, Peoples R China
关键词
Categorial data; Clustering; Penalty; Regularization parameter;
D O I
10.1007/s12543-009-0001-5
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In this paper, we investigate the problem of determining the number of clusters in the k-modes based categorical data clustering process. We propose a new categorical data clustering algorithm with automatic selection of k. The new algorithm extends the k-modes clustering algorithm by introducing a penalty term to the objective function to make more clusters compete for objects. In the new objective function, we employ a regularization parameter to control the number of clusters in a clustering process. Instead of finding k directly, we choose a suitable value of regularization parameter such that the corresponding clustering result is the most stable one among all the generated clustering results. Experimental results on synthetic data sets and the real data sets are used to demonstrate the effectiveness of the proposed algorithm.
引用
收藏
页码:5 / 25
页数:21
相关论文
共 50 条
  • [41] CLUSTERING CATEGORICAL DATA BASED ON COMBINATIONS OF ATTRIBUTE VALUES
    Do, Hee-Jung
    Kim, Jae Yearn
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2009, 5 (12A): : 4393 - 4405
  • [42] Multiobjective clustering algorithm with fuzzy centroids for categorical data
    Zhou Z.
    Zhu S.
    Zhang D.
    1600, Science Press (53): : 2594 - 2606
  • [43] An Integrated Clustering Approach for High Dimensional Categorical Data
    Kalaivani, K.
    Raghavendra, A. P. V.
    2013 IEEE INTERNATIONAL CONFERENCE ON GREEN HIGH PERFORMANCE COMPUTING (ICGHPC), 2013,
  • [44] Clustering High-Dimensional Noisy Categorical Data
    Tian, Zhiyi
    Xu, Jiaming
    Tang, Jen
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (548) : 3008 - 3019
  • [45] Performances of parallel clustering algorithm for categorical and mixed data
    Hai, NTM
    Susumu, H
    PARALLEL AND DISTRIBUTED COMPUTING: APPLICATIONS AND TECHNOLOGIES, PROCEEDINGS, 2004, 3320 : 252 - 256
  • [46] A fair-multicluster approach to clustering of categorical data
    Santos-Mangudo, Carlos
    Heras, Antonio J.
    CENTRAL EUROPEAN JOURNAL OF OPERATIONS RESEARCH, 2023, 31 (02) : 583 - 604
  • [47] A k-populations algorithm for clustering categorical data
    Kim, DW
    Lee, K
    Lee, D
    Lee, KH
    PATTERN RECOGNITION, 2005, 38 (07) : 1131 - 1134
  • [48] Clustering mixed numerical and categorical data with missing values
    Dinh, Duy-Tai
    Huynh, Van-Nam
    Sriboonchitta, Songsak
    INFORMATION SCIENCES, 2021, 571 : 418 - 442
  • [49] A fair-multicluster approach to clustering of categorical data
    Carlos Santos-Mangudo
    Antonio J. Heras
    Central European Journal of Operations Research, 2023, 31 : 583 - 604
  • [50] Apply clustering to analyze categorical data in longitudinal studies
    Hassan, Mohammad Mahdi
    Blom, Martin
    Ansari, Gufran Ahmad
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2019, 19 (04): : 10 - 19