Exploring the uniform effect of FCM clustering: A data distribution perspective

被引:52
作者
Zhou, Kaile [1 ,2 ]
Yang, Shanlin [1 ,2 ]
机构
[1] Hefei Univ Technol, Minist Educ, Key Lab Proc Optimizat & Intelligent Decis Making, Hefei 230009, Peoples R China
[2] Hefei Univ Technol, Sch Management, Hefei 230009, Peoples R China
基金
中国国家自然科学基金;
关键词
Fuzzy c-means (FCM); Data distribution; Uniform effect; Coefficient of variation (CV); Clustering; FUZZY C-MEANS; VALIDITY INDEX; ALGORITHM; CLASSIFICATION; INFORMATION;
D O I
10.1016/j.knosys.2016.01.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fuzzy c-means (FCM) is a well-known and widely used fuzzy clustering method. Though there have been considerable studies that focused on the improvement of FCM algorithm or its applications, it is still necessary to understand the effect of data distributions on the performance of FCM. In this paper, we present an organized study of FCM clustering from the perspective of data distribution. We first analyze the structure of the objective function of FCM and find that FCM has the same uniform effect as K-means. Namely, FCM also tends to produce clusters of relatively uniform sizes. The coefficient of variation (CV) is introduced to measure the variation of cluster sizes in a given data set. Then based on the change of CV values between the original "true" cluster sizes and the cluster sizes partitioned by FCM clustering, a necessary but not sufficient criterion for the validation of FCM clustering is proposed from the data distribution perspective. Finally, our experiments on six synthetic data sets and ten real-world data sets further demonstrate the uniform effect of FCM. It tends to reduce the variation in cluster sizes when the CV value of the original data distribution is larger than 0.88, and increase the variation when the variation of original "true" cluster sizes is low. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:76 / 83
页数:8
相关论文
共 55 条
[1]   A modified fuzzy C-means algorithm for bias field estimation and segmentation of MRI data [J].
Ahmed, MN ;
Yamany, SM ;
Mohamed, N ;
Farag, AA ;
Moriarty, T .
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2002, 21 (03) :193-199
[2]  
[Anonymous], 1967, PAPER PRESENTED P 5
[3]  
Bache K., 2013, UCI Machine Learning Repository
[4]   A novel fuzzy clustering algorithm with between-cluster information for categorical data [J].
Bai, Liang ;
Liang, Jiye ;
Dang, Chuangyin ;
Cao, Fuyuan .
FUZZY SETS AND SYSTEMS, 2013, 215 :55-73
[5]  
Bezdek J. C., 1981, Pattern recognition with fuzzy objective function algorithms
[6]  
BEZDEK JC, 1976, IEEE T SYST MAN CYB, V6, P387
[7]   FCM - THE FUZZY C-MEANS CLUSTERING-ALGORITHM [J].
BEZDEK, JC ;
EHRLICH, R ;
FULL, W .
COMPUTERS & GEOSCIENCES, 1984, 10 (2-3) :191-203
[8]   Cluster validation techniques for genome expression data [J].
Bolshakova, N ;
Azuaje, F .
SIGNAL PROCESSING, 2003, 83 (04) :825-833
[9]   CLUSTERING OF CLUSTERS [J].
CHAN, KP ;
CHEUNG, YS .
PATTERN RECOGNITION, 1992, 25 (02) :211-217
[10]  
Chintalapudi KK, 1998, 1998 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AT THE IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE - PROCEEDINGS, VOL 1-2, P1458, DOI 10.1109/FUZZY.1998.686334