A weighting k-modes algorithm for subspace clustering of categorical data

被引:49
作者
Cao, Fuyuan [1 ]
Liang, Jiye [1 ]
Li, Deyu [1 ]
Zhao, Xingwang [1 ]
机构
[1] Shanxi Univ, Sch Comp & Informat Technol, Key Lab Computat Intelligence & Chinese Informat, Minist Educ, Taiyuan 030006, Shanxi, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Subspace clustering; Weight; k-Modes algorithm; Categorical data; ENTROPY;
D O I
10.1016/j.neucom.2012.11.009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional clustering algorithms consider all of the dimensions of an input data set equally. However, in the high dimensional data, a common property is that data points are highly clustered in subspaces, which means classes of objects are categorized in subspaces rather than the entire space. Subspace clustering is an extension of traditional clustering that seeks to find clusters in different subspaces within a data set. In this paper, a weighting k-modes algorithm is presented for subspace clustering of categorical data and its corresponding time complexity is analyzed as well. In the proposed algorithm, an additional step is added to the k-modes clustering process to automatically compute the weight of all dimensions in each cluster by using complement entropy. Furthermore, the attribute weight can be used to identify the subsets of important dimensions that categorize different clusters. The effectiveness of the proposed algorithm is demonstrated with real data sets and synthetic data sets. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:23 / 30
页数:8
相关论文
共 38 条
[1]  
Aggarwal CC, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P61, DOI 10.1145/304181.304188
[2]  
Aggarwal CC, 2000, SIGMOD REC, V29, P70, DOI 10.1145/335191.335383
[3]  
Agrawal R., 1998, SIGMOD Record, V27, P94, DOI 10.1145/276305.276314
[4]   Automatic subspace clustering of high dimensional data [J].
Agrawal, R ;
Gehrke, J ;
Gunopulos, D ;
Raghavan, P .
DATA MINING AND KNOWLEDGE DISCOVERY, 2005, 11 (01) :5-33
[5]  
[Anonymous], 2002, Technical Report
[6]  
[Anonymous], 2004, ACM SIGKDD EXPLOR NE
[7]  
[Anonymous], 2009, DAT GEN PERF DAT IMP
[8]  
[Anonymous], 2009, UCI MACHINE LEARNING
[9]   Selection of relevant features and examples in machine learning [J].
Blum, AL ;
Langley, P .
ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) :245-271
[10]   A Framework for Clustering Categorical Time-Evolving Data [J].
Cao, Fuyuan ;
Liang, Jiye ;
Bai, Liang ;
Zhao, Xingwang ;
Dang, Chuangyin .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2010, 18 (05) :872-882