Categorical Data Clustering with Automatic Selection of Cluster Number

被引:9
|
作者
Liao, Hai-Yong [1 ,2 ]
Ng, Michael K. [1 ,2 ]
机构
[1] Hong Kong Baptist Univ, Ctr Math Imaging & Vis, Kowloon Tong, Hong Kong, Peoples R China
[2] Hong Kong Baptist Univ, Dept Math, Kowloon Tong, Hong Kong, Peoples R China
关键词
Categorial data; Clustering; Penalty; Regularization parameter;
D O I
10.1007/s12543-009-0001-5
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In this paper, we investigate the problem of determining the number of clusters in the k-modes based categorical data clustering process. We propose a new categorical data clustering algorithm with automatic selection of k. The new algorithm extends the k-modes clustering algorithm by introducing a penalty term to the objective function to make more clusters compete for objects. In the new objective function, we employ a regularization parameter to control the number of clusters in a clustering process. Instead of finding k directly, we choose a suitable value of regularization parameter such that the corresponding clustering result is the most stable one among all the generated clustering results. Experimental results on synthetic data sets and the real data sets are used to demonstrate the effectiveness of the proposed algorithm.
引用
收藏
页码:5 / 25
页数:21
相关论文
共 50 条
  • [31] Improved Clustering for Categorical Data with Genetic Algorithm
    Sharma, Abha
    Thakur, R. S.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MICROELECTRONICS, COMPUTING & COMMUNICATION SYSTEMS, MCCS 2015, 2018, 453 : 67 - 76
  • [32] The Performance of Objective Functions for Clustering Categorical Data
    Xiang, Zhengrong
    Islam, Md Zahidul
    KNOWLEDGE MANAGEMENT AND ACQUISITION FOR SMART SYSTEMS AND SERVICES, PKAW 2014, 2014, 8863 : 16 - 28
  • [33] Automatic Estimation of Cluster Number in Fuzzy Co-clustering Based on Competition and Elimination of Clusters
    Ubukata, Seiki
    Yanagisawa, Kazuki
    Notsu, Akira
    Honda, Katsuhiro
    2018 JOINT 10TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 19TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2018, : 660 - 665
  • [34] Coercion: A Distributed Clustering Algorithm for Categorical Data
    Wang, Bin
    Zhou, Yang
    Hei, Xinhong
    2013 9TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2013, : 683 - 687
  • [35] On clustering tree structured data with categorical nature
    Boutsinas, B.
    Papastergiou, T.
    PATTERN RECOGNITION, 2008, 41 (12) : 3613 - 3623
  • [36] Rough Set Approach for Categorical Data Clustering
    Herawan, Tutut
    Yanto, Iwan Tri Riyadi
    Deris, Mustafa Mat
    DATABASE THEORY AND APPLICATION, 2009, 64 : 179 - 186
  • [37] A Roughset Based Data Labeling Method for Clustering Categorical Data
    Reddy, H. Venkateswara
    Raju, S. Viswanadha
    2014 3RD INTERNATIONAL CONFERENCE ON ECO-FRIENDLY COMPUTING AND COMMUNICATION SYSTEMS (ICECCS 2014), 2014, : 51 - 55
  • [38] CVIK: A MATLAB-based cluster validity index toolbox for automatic data clustering
    Jose-Garcia, Adan
    Gomez-Flores, Wilfrido
    SOFTWAREX, 2023, 22
  • [39] A Support Based Initialization Algorithm for Categorical Data Clustering
    Kumar, Ajay
    Kumar, Shishir
    JOURNAL OF INFORMATION TECHNOLOGY RESEARCH, 2018, 11 (02) : 53 - 67
  • [40] Automatic clustering of hyperspectral data
    Salomon, R.
    Dolberg, S.
    Rotman, S. R.
    2006 IEEE 24TH CONVENTION OF ELECTRICAL & ELECTRONICS ENGINEERS IN ISRAEL, 2006, : 334 - +