A new robust fuzzy clustering validity index for imbalanced data sets

被引:27
作者
Liu, Yun [1 ,2 ]
Jiang, Yanfang [2 ]
Hou, Tao [1 ]
Liu, Fu [1 ]
机构
[1] Jilin Univ, Coll Commun Engn, Changchun, Peoples R China
[2] First Hosp Jilin Univ, Changchun, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Clustering analysis; Clustering validity index; Fuzzy c-means method; Imbalanced data sets; DIFFERENT SIZES; ALGORITHMS;
D O I
10.1016/j.ins.2020.08.041
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Determining the number of clusters of a data set, which is usually evaluated by a clustering validity index (CVI), is a significant issue in clustering analysis. While several CVIs have been proposed, the imperfect clustering results of the fuzzy c-means (FCM) clustering algo-rithm on imbalanced data sets may affect their decisions. To address this problem, the impact of imperfect clustering results on the traditional CVI is first analyzed, and it is found that the distance between two imbalanced clusters becomes closer, which will subsequently impact the separation metric. Inspired by this, a new fuzzy CVI called the imbalanced index (IMI) is proposed in this paper. IMI is the ratio of the fuzzy compactness and separation metrics. The main characteristic of IMI is the new definition of the separation metric, in which the imbalance ratio of two clusters is used to enlarge the distance between their centers. IMI is then employed to evaluate the clustering results of FCM on a variety of data sets, and is compared with several well-known CVIs. The experimental results demonstrate that IMI is robust to the imperfect clustering results of FCM caused by imbalanced data distributions and achieves superior performance as compared to other CVIs. (C) 2020 Elsevier Inc. All rights reserved.
引用
收藏
页码:579 / 591
页数:13
相关论文
共 31 条
[1]  
Bezdek J. C., 1973, Journal of Cybernetics, V3, P58, DOI 10.1080/01969727308546047
[2]   NUMERICAL TAXONOMY WITH FUZZY SETS [J].
BEZDEK, JC .
JOURNAL OF MATHEMATICAL BIOLOGY, 1974, 1 (01) :57-71
[3]   Fast and robust fuzzy c-means clustering algorithms incorporating local information for image segmentation [J].
Cai, Weiling ;
Chen, Songean ;
Zhang, Daoqiang .
PATTERN RECOGNITION, 2007, 40 (03) :825-838
[4]   A Novel Cluster Validity Index Based on Local Cores [J].
Cheng, Dongdong ;
Zhu, Qingsheng ;
Huang, Jinlong ;
Wu, Quanwang ;
Yang, Lijun .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (04) :985-999
[5]   A new cluster validity measure and its application to image compression [J].
Chou, CH ;
Su, MC ;
Lai, E .
PATTERN ANALYSIS AND APPLICATIONS, 2004, 7 (02) :205-220
[6]   Validating fuzzy partitions obtained through c-shells clustering [J].
Dave, RN .
PATTERN RECOGNITION LETTERS, 1996, 17 (06) :613-623
[7]   Fast agglomerative clustering using a k-nearest neighbor graph [J].
Franti, Pasi ;
Virmajoki, Olli ;
Hautamaki, Ville .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2006, 28 (11) :1875-1881
[8]  
Fukuyama Y., 1989, Proc. of the Fifth Fuzzy Systems Symp, P247
[9]   RECOME: A new density-based clustering algorithm using relative KNN kernel density [J].
Geng, Yangli-ao ;
Li, Qingyong ;
Zheng, Rong ;
Zhuang, Fuzhen ;
He, Ruisi ;
Xiong, Naixue .
INFORMATION SCIENCES, 2018, 436 :13-30
[10]   Support Vector Data Descriptions and k-Means Clustering: One Class? [J].
Goernitz, Nico ;
Lima, Luiz Alberto ;
Mueller, Klaus-Robert ;
Kloft, Marius ;
Nakajima, Shinichi .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (09) :3994-4006