A weighting k-modes algorithm for subspace clustering of categorical data

被引:48
|
作者
Cao, Fuyuan [1 ]
Liang, Jiye [1 ]
Li, Deyu [1 ]
Zhao, Xingwang [1 ]
机构
[1] Shanxi Univ, Sch Comp & Informat Technol, Key Lab Computat Intelligence & Chinese Informat, Minist Educ, Taiyuan 030006, Shanxi, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Subspace clustering; Weight; k-Modes algorithm; Categorical data; ENTROPY;
D O I
10.1016/j.neucom.2012.11.009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional clustering algorithms consider all of the dimensions of an input data set equally. However, in the high dimensional data, a common property is that data points are highly clustered in subspaces, which means classes of objects are categorized in subspaces rather than the entire space. Subspace clustering is an extension of traditional clustering that seeks to find clusters in different subspaces within a data set. In this paper, a weighting k-modes algorithm is presented for subspace clustering of categorical data and its corresponding time complexity is analyzed as well. In the proposed algorithm, an additional step is added to the k-modes clustering process to automatically compute the weight of all dimensions in each cluster by using complement entropy. Furthermore, the attribute weight can be used to identify the subsets of important dimensions that categorize different clusters. The effectiveness of the proposed algorithm is demonstrated with real data sets and synthetic data sets. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:23 / 30
页数:8
相关论文
共 50 条
  • [1] A fuzzy k-modes algorithm for clustering categorical data
    Huang, ZX
    Ng, MK
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 1999, 7 (04) : 446 - 452
  • [2] A Global K-modes Algorithm for Clustering Categorical Data
    Bai Tian
    Kulikowski, C. A.
    Gong Leiguang
    Yang Bin
    Huang Lan
    Zhou Chunguang
    CHINESE JOURNAL OF ELECTRONICS, 2012, 21 (03): : 460 - 465
  • [3] A genetic k-modes algorithm for clustering categorical data
    Gan, GJ
    Yang, ZJ
    Wu, JH
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2005, 3584 : 195 - 202
  • [4] A genetic fuzzy k-Modes algorithm for clustering categorical data
    Gan, G.
    Wu, J.
    Yang, Z.
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (02) : 1615 - 1620
  • [5] Initialization of K-Modes Clustering for Categorical Data
    Li Tao-ying
    Chen Yan
    Jin Zhi-hong
    Li Ye
    2013 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE AND ENGINEERING (ICMSE), 2013, : 107 - 112
  • [6] An efficient k-modes algorithm for clustering categorical datasets
    Dorman, Karin S.
    Maitra, Ranjan
    STATISTICAL ANALYSIS AND DATA MINING, 2022, 15 (01) : 83 - 97
  • [7] Clustering categorical data: Soft rounding k-modes
    Gavva, Surya Teja
    Karthik, C. S.
    Punna, Sharath
    INFORMATION AND COMPUTATION, 2024, 296
  • [8] Clustering of Categorical Data Using Intuitionistic Fuzzy k-modes
    Mehta, Darshan
    Tripathy, B. K.
    PROCEEDINGS OF SIXTH INTERNATIONAL CONFERENCE ON SOFT COMPUTING FOR PROBLEM SOLVING (SOCPROS 2016), VOL 1, 2017, 546 : 254 - 263
  • [9] Categorical data clustering: 25 years beyond K-modes
    Dinh, Tai
    Wong, Hauchi
    Fournier-Viger, Philippe
    Lisik, Daniil
    Ha, Minh-Quyet
    Dam, Hieu-Chi
    Huynh, Van-Nam
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 272
  • [10] Attribute value weighting in k-modes clustering
    He, Zengyou
    Xu, Xiaofei
    Deng, Shengchun
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (12) : 15365 - 15369