A Comparison of Categorical Attribute Data Clustering Methods

被引:0
作者
Hautamaki, Ville [1 ]
Pollanen, Antti [1 ]
Kinnunen, Tomi [1 ]
Lee, Kong Aik [2 ]
Li, Haizhou [2 ]
Franti, Pasi [1 ]
机构
[1] Univ Eastern Finland, Sch Comp, Joensuu, Finland
[2] ASTAR, Inst Infocomm Res, Singapore, Singapore
来源
STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION | 2014年 / 8621卷
关键词
ALGORITHM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering data in Euclidean space has a long tradition and there has been considerable attention on analyzing several different cost functions. Unfortunately these result rarely generalize to clustering of categorical attribute data. Instead, a simple heuristic k-modes is the most commonly used method despite its modest performance. In this study, we model clusters by their empirical distributions and use expected entropy as the objective function. A novel clustering algorithm is designed based on local search for this objective function and compared against six existing algorithms on well known data sets. The proposed method provides better clustering quality than the other iterative methods at the cost of higher time complexity.
引用
收藏
页码:53 / 62
页数:10
相关论文
共 18 条
  • [1] Andritsos P, 2004, LECT NOTES COMPUT SC, V2992, P123
  • [2] [Anonymous], 1991, ELEMENTS INFORM THEO, DOI [DOI 10.1002/0471200611, 10.1002/0471200611]
  • [3] [Anonymous], 2007, Uci machine learning repository
  • [4] Barbara D., 2002, Proceedings of the Eleventh International Conference on Information and Knowledge Management. CIKM 2002, P582, DOI 10.1145/584792.584888
  • [5] Bishop C., 2006, PATTERN RECOGN, DOI DOI 10.1117/1.2819119
  • [6] Cai ZH, 2007, LECT NOTES ARTIF INT, V4682, P436
  • [7] Chakrabarti D., 2004, P ACM SIGKDD C
  • [8] Chen K., 2005, P 17 INT C SCI STAT, P253
  • [9] Gersho A., 1992, Vector quantization and signal compression
  • [10] Rock: A robust clustering algorithm for categorical attributes
    Guha, S
    Rastogi, R
    Shim, K
    [J]. INFORMATION SYSTEMS, 2000, 25 (05) : 345 - 366