R-map: Mapping categorical data for clustering and visualization based on reference sets

被引:0
作者
Shen, Zhi-Yong [1 ,3 ]
Sun, Jun [3 ]
Shen, Yi-Dong [1 ]
Li, Ming [2 ]
机构
[1] Chinese Acad Sci, Grad Univ, Comp Sci Lab, Inst Software, Beijing 100864, Peoples R China
[2] Michigan State Univ, Dept Epidemiol, E Lansing, MI 48824 USA
[3] Graduated Univ, Chinese Acad Sci, Beijing, Peoples R China
来源
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS | 2008年 / 5012卷
关键词
clustering; data mapping; categorical data;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a framework that maps categorical data into a numerical data space via a reference set, aiming to make the existing numerical clustering algorithms directly applicable on the generated image data set as well as to visualize the data. Using statistics theories, we analyze our framework and give the conditions under which the data mapping is efficient and yet preserves a flexible property of the original data, i.e. the data points within the same cluster are more similar. The algorithm is simple and has good effectiveness under some conditions. The experimental evaluation on numerous categorical data sets shows that it not only outperforms the related data mapping approaches but also beats some categorical clustering algorithms in terms of effectiveness and efficiency.
引用
收藏
页码:992 / +
页数:2
相关论文
共 10 条
[1]  
[Anonymous], MULTIDIMENSIONAL SCA
[2]  
DING C, 2004, SPECTRAL CLUSTERING
[3]   Rock: A robust clustering algorithm for categorical attributes [J].
Guha, S ;
Rastogi, R ;
Shim, K .
INFORMATION SYSTEMS, 2000, 25 (05) :345-366
[4]  
HUANG ZX, 1997, P ACM SIGMOD INT C M
[5]  
Kaufman L., 2009, Finding groups in data: An introduction to cluster analysis
[6]  
MACQUEEN JB, 1965, P 5 S MATH STAT PROB
[7]  
Newman D.J., 1998, UCI REPOSITORY MACHI
[8]  
Platt John, 2005, INT WORKSH ART INT S, P261
[9]  
ROWEIS S, SCIENCE
[10]  
SILVA V, 2003, P NIPS 2003, P721