k-ANMI:: A mutual information based clustering algorithm for categorical data

被引:48
作者
He, Zengyou [1 ]
Xu, Xiaofei [1 ]
Deng, Shengchun [1 ]
机构
[1] Harbin Inst Technol, Dept Comp Sci & Engn, Harbin 150001, Peoples R China
基金
中国国家自然科学基金;
关键词
clustering; categorical data; mutual information; cluster ensemble; data mining;
D O I
10.1016/j.inffus.2006.05.006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering categorical data is an integral part of data mining and has attracted much attention recently. In this paper, we present k-ANMI, a new efficient algorithm for clustering categorical data. The k-ANMI algorithm works in a way that is similar to the popular k-means algorithm, and the goodness of clustering in each step is evaluated using a mutual information based criterion (namely, average normalized mutual information-ANMI) borrowed from cluster ensemble. This algorithm is easy to implement, requiring multiple hash tables as the only major data structure. Experimental results on real datasets show that k-ANMI algorithm is competitive with those state-of-the-art categorical data clustering algorithms with respect to clustering accuracy. (c) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:223 / 233
页数:11
相关论文
共 37 条
[1]  
Andritsos P, 2004, LECT NOTES COMPUT SC, V2992, P123
[2]  
[Anonymous], INFORM FUSION
[3]  
[Anonymous], 2002, Relationship-based Clustering and Cluster Ensembles for High-Dimensional Data Mining
[4]  
[Anonymous], 2002, J. Mach. Learn. Res
[5]  
Barbara D., 2002, Proceedings of the Eleventh International Conference on Information and Knowledge Management. CIKM 2002, P582, DOI 10.1145/584792.584888
[6]   Categorical data visualization and clustering using subjective factors [J].
Chang, CH ;
Ding, ZK .
DATA & KNOWLEDGE ENGINEERING, 2005, 53 (03) :243-262
[7]  
Chen M., 2004, P SDM 04
[8]  
Cristofor D, 2002, J UNIVERS COMPUT SCI, V8, P153
[9]  
Ganti Venkatesh., 1999, Int. Conf. Knowledge Discovery and Data Mining, P73, DOI DOI 10.1145/312129.312201
[10]  
Giannotti F., 2002, Principles of Data Mining and Knowledge Discovery. 6th European Conference, PKDD 2002. Proceedings (Lecture Notes in Artificial Intelligence Vol.2431), P175