CLUSTERING CATEGORICAL DATA BASED ON COMBINATIONS OF ATTRIBUTE VALUES

被引:0
作者
Do, Hee-Jung [1 ]
Kim, Jae Yearn [1 ]
机构
[1] Hanyang Univ, Dept Ind Engn, Seoul 133791, South Korea
来源
INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL | 2009年 / 5卷 / 12A期
关键词
Data mining; Clustering; Categorical data;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is an important technique for exploratory data analysis. While most of the earlier clustering algorithms focused on numerical data, real-world problems and data mining applications frequently involve categorical data. Here, we propose a new clustering algorithm for categorical data that is based on the frequency of attribute value combinations. Our algorithm; finds all the combinations of attribute values in an object, which represent a subset of all the attribute values, and then groups the object using the frequency of these combinations in each cluster. As our algorithm considers all the subsets of attribute values in an object, objects in a cluster have not only similar attribute value sets but also strongly associated attribute values. Also, the proposed algorithm. is not the clustering method using the similarity between only two objects, but rather uses the similarity between an object and clusters. Therefore, it provides global information in clustering results. We conducted experiments with real and synthetic data sets to evaluate FA VC. We show that FA VC is more scalable and provides higher quality results than the previous method.
引用
收藏
页码:4393 / 4405
页数:13
相关论文
共 21 条
[1]   Finding localized associations in market basket data [J].
Aggarwal, CC ;
Procopiuc, C ;
Yu, PS .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2002, 14 (01) :51-62
[2]  
[Anonymous], DAT MIN KNOWL DISC W, DOI DOI 10.1145/882082.882087
[3]  
[Anonymous], 2010, UCI Machine Learning Repository
[4]  
[Anonymous], 1999, P 5 ACM SIGKDD INT C
[5]  
BARBARA D, 2002, P 2002 ACM CIKM INT, P590
[6]  
Chen HL, 2005, Fifth IEEE International Conference on Data Mining, Proceedings, P106
[7]  
Gibson D., 1998, Proceedings of the Twenty-Fourth International Conference on Very-Large Databases, P311
[8]  
GLUCK A, 1985, P 7 ANN C COGN SCI S, P283
[9]   Rock: A robust clustering algorithm for categorical attributes [J].
Guha, S ;
Rastogi, R ;
Shim, K .
INFORMATION SYSTEMS, 2000, 25 (05) :345-366
[10]   On clustering validation techniques [J].
Halkidi, M ;
Batistakis, Y ;
Vazirgiannis, M .
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2001, 17 (2-3) :107-145