Clustering incomplete data using kernel-based fuzzy C-means algorithm

被引:263
作者
Zhang, DQ [1 ]
Chen, SC [1 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Dept Comp Sci & Engn, Nanjing 210016, Peoples R China
关键词
clustering; fuzzy c-means; incomplete data; Kernel methods;
D O I
10.1023/B:NEPL.0000011135.19145.1b
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There is a trend in recent machine learning community to construct a nonlinear version of a linear algorithm using the 'kernel method', e.g. Support Vector Machines (SVMs), kernel principal component analysis, kernel fisher discriminant analysis and the recent kernel clustering algorithms. In unsupervised clustering algorithms using kernel method, typically, a nonlinear mapping is used first to map the data into a potentially much higher feature space, where clustering is then performed. A drawback of these kernel clustering algorithms is that the clustering prototypes lie in high dimensional feature space and hence lack clear and intuitive descriptions unless using additional projection approximation from the feature to the data space as done in the existing literatures. In this paper, a novel clustering algorithm using the 'kernel method' based on the classical fuzzy clustering algorithm (FCM) is proposed and called as kernel fuzzy c-means algorithm (KFCM). KFCM adopts a new kernel-induced metric in the data space to replace the original Euclidean norm metric in FCM and the clustered prototypes still lie in the data space so that the clustering results can be reformulated and interpreted in the original space. Our analysis shows that KFCM is robust to noise and outliers and also tolerates unequal sized clusters. And finally this property is utilized to cluster incomplete data. Experiments on two artificial and one real datasets show that KFCM has better clustering performance and more robust than several modi. cations of FCM for incomplete data clustering.
引用
收藏
页码:155 / 162
页数:8
相关论文
共 14 条
[1]  
[Anonymous], 1986, PRINCIPAL COMPONENT
[2]  
[Anonymous], Pattern Recognition With Fuzzy Objective Function Algorithms
[3]   PYRAMIDAL CLASSIFICATION BASED ON INCOMPLETE DISSIMILARITY DATA [J].
GAUL, W ;
SCHADER, M .
JOURNAL OF CLASSIFICATION, 1994, 11 (02) :171-193
[4]   Mercer kernel-based clustering in feature space [J].
Girolami, M .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2002, 13 (03) :780-784
[5]   Fuzzy c-means clustering of incomplete data [J].
Hathaway, RJ ;
Bezdek, JC .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2001, 31 (05) :735-744
[6]   Clustering incomplete relational data using the non-Euclidean relational fuzzy c-means algorithm [J].
Hathaway, RJ ;
Bezdek, JC .
PATTERN RECOGNITION LETTERS, 2002, 23 (1-3) :151-160
[7]  
Jain K, 1988, Algorithms for clustering data
[8]  
Li Zhang, 2002, Chinese Journal of Computers, V25, P587
[9]   Sumo, ubiquitin's mysterious cousin [J].
Müller, S ;
Hoege, C ;
Pyrowolakis, G ;
Jentsch, S .
NATURE REVIEWS MOLECULAR CELL BIOLOGY, 2001, 2 (03) :202-210
[10]  
Rudin Walter, 1976, Principles of mathematical analysis, V3