A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets

被引:44
作者
Ahmad, Amir [1 ]
Dey, Lipika [2 ]
机构
[1] King Abdulaziz Univ, Fac Comp & Informat Technol, Rabigh, Saudi Arabia
[2] Tata Consultancy Serv, Innovat Labs, New Delhi, India
关键词
Clustering; Subspace clustering; Mixed data; Categorical data;
D O I
10.1016/j.patrec.2011.02.017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Almost all subspace clustering algorithms proposed so far are designed for numeric datasets. In this paper, we present a k-means type clustering algorithm that finds clusters in data subspaces in mixed numeric and categorical datasets. In this method, we compute attributes contribution to different clusters. We propose a new cost function for a k-means type algorithm. One of the advantages of this algorithm is its complexity which is linear with respect to the number of the data points. This algorithm is also useful in describing the cluster formation in terms of attributes contribution to different clusters. The algorithm is tested on various synthetic and real datasets to show its effectiveness. The clustering results are explained by using attributes weights in the clusters. The clustering results are also compared with published results. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:1062 / 1069
页数:8
相关论文
共 24 条
[1]  
AGGARWAL CC, 2000, P ACM SIGMOD
[2]  
Agrawal R., 1998, Proc. of ACM SIGMOD, P94
[3]   A k-mean clustering algorithm for mixed numeric and categorical data [J].
Ahmad, Amir ;
Dey, Lipika .
DATA & KNOWLEDGE ENGINEERING, 2007, 63 (02) :503-527
[4]   A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set [J].
Ahmad, Amir ;
Dey, Lipika .
PATTERN RECOGNITION LETTERS, 2007, 28 (01) :110-118
[5]  
[Anonymous], 1988, Algorithms for Clustering Data
[6]  
BARBARA D, 2002, COOLCAT ENTROPY BASE, P582
[7]   On data labeling for clustering categorical data [J].
Chen, Hung-Leng ;
Chuang, Kun-Ta ;
Chen, Ming-Syan .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (11) :1458-1471
[8]   Density Conscious Subspace Clustering for High-Dimensional Data [J].
Chu, Yi-Hong ;
Huang, Jen-Wei ;
Chuang, Kun-Ta ;
Yang, De-Nian ;
Chen, Ming-Syan .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2010, 22 (01) :16-30
[9]   Reducing Redundancy in Subspace Clustering [J].
Chu, Yi-Hong ;
Chen, Ying-Ju ;
Yang, De-Nian ;
Chen, Ming-Syan .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (10) :1432-1446
[10]   Enhanced soft subspace clustering integrating within-cluster and between-cluster information [J].
Deng, Zhaohong ;
Choi, Kup-Sze ;
Chung, Fu-Lai ;
Wang, Shitong .
PATTERN RECOGNITION, 2010, 43 (03) :767-781