Fuzzy C-Means clustering algorithm for data with unequal cluster sizes and contaminated with noise and outliers: Review and development

被引:190
作者
Askari, Salar [1 ]
机构
[1] Amirkabir Univ Technol, Tehran Polytech, Mech Engn Dept, 424 Hafez Ave, Tehran 1591634311, Iran
关键词
Fuzzy C-Means; FCM; Clustering; Outlier; Noise; Unequal clusters; LOGICAL RELATIONSHIP GROUPS; TIME-SERIES; IMAGE SEGMENTATION; FORECASTING ALGORITHM; VALIDITY; CONVERGENCE; DBSCAN; MODEL;
D O I
10.1016/j.eswa.2020.113856
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering algorithms aim at finding dense regions of data based on similarities and dissimilarities of data points. Noise and outliers contribute to the computational procedure of the algorithms as well as the actual data points that leads to inaccurate and misplaced cluster centers. This problem also arises when sizes of the clusters are different that moves centers of small clusters towards large clusters. Mass of the data points is important as well as their location in engineering and physics where non-uniform mass distribution results displacement of the cluster centers towards heavier clusters even if sizes of the clusters are identical and the data are noise-free. Fuzzy C-Means (FCM) algorithm that suffers from these problems is the most popular fuzzy clustering algorithm and has been subject of numerous researches and developments though improvements are still marginal. This work revises the FCM algorithm to make it applicable to data with unequal cluster sizes, noise and outliers, and non-uniform mass distribution. Revised FCM (RFCM) algorithm employs adaptive exponential functions to eliminate impacts of noise and outliers on the cluster centers and modifies constraint of the FCM algorithm to prevent large or heavier clusters from attracting centers of small clusters. Several algorithms are reviewed and their mathematical structures are discussed in the paper including Possibilistic Fuzzy C-Means (PFCM), Possibilistic C-Means (PCM), Robust Fuzzy C-Means (FCM-sigma), Noise Clustering (NC), Kernel Fuzzy C-Means (KFCM), Intuitionistic Fuzzy C-Means (IFCM), Robust Kernel Fuzzy C-Mean (KFCM-sigma), Robust Intuitionistic Fuzzy C-Means (IFCM-sigma), Kernel Intuitionistic Fuzzy C-Means (KIFCM), Robust Kernel Intuitionistic Fuzzy C-Means (KIFCM-sigma), Credibilistic Fuzzy C-Means (CFCM), Size-insensitive integrity-based Fuzzy C-Means (siibFCM), Size-insensitive Fuzzy C-Means (csiFCM), Subtractive Clustering (SC), Density Based Spatial Clustering of Applications with Noise (DBSCAN), Gaussian Mixture Models (GMM), Spectral clustering, and Outlier Removal Clustering (ORC). Some of these algorithms are suitable for noisy data and some others are designed for data with unequal clusters. The study shows that the RFCM algorithm works for both cases and outperforms the both categories of the algorithms.
引用
收藏
页数:27
相关论文
共 69 条
[1]   Comparing Fuzzy, Probabilistic, and Possibilistic Partitions [J].
Anderson, Derek T. ;
Bezdek, James C. ;
Popescu, Mihail ;
Keller, James M. .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2010, 18 (05) :906-918
[2]  
[Anonymous], 1994, Journal of Intelligent & Fuzzy Systems, DOI [10.3233/ifs-1994-2306, DOI 10.3233/IFS-1994-2306]
[3]   Modeling energy flow in natural gas networks using time series disaggregation and fuzzy systems tuned by particle swarm optimization [J].
Askari, S. ;
Montazerin, N. ;
Zarandi, M. H. Fazel .
APPLIED SOFT COMPUTING, 2020, 92
[4]  
Askari S, 2017, INT J ENG-IRAN, V30, P1391, DOI 10.5829/idosi.ije.2017.30.09c.12
[6]   Generalized Possibilistic Fuzzy C-Means with novel cluster validity indices for clustering noisy data [J].
Askari, S. ;
Montazerin, N. ;
Zarandi, M. H. Fazel .
APPLIED SOFT COMPUTING, 2017, 53 :262-283
[7]   Generalized entropy based possibilistic fuzzy C-Means for clustering noisy data and its convergence proof [J].
Askari, S. ;
Montazerin, N. ;
Zarandi, M. H. Fazel ;
Hakimi, E. .
NEUROCOMPUTING, 2017, 219 :186-202
[8]   A clustering based forecasting algorithm for multivariable fuzzy time series using linear combinations of independent variables [J].
Askari, S. ;
Montazerin, N. ;
Zarandi, M. H. Fazel .
APPLIED SOFT COMPUTING, 2015, 35 :151-160
[9]   Forecasting semi-dynamic response of natural gas networks to nodal gas consumptions using genetic fuzzy systems [J].
Askari, S. ;
Montazerin, N. ;
Zarandi, M. H. Fazel .
ENERGY, 2015, 83 :252-266
[10]   A high-order multi-variable Fuzzy Time Series forecasting algorithm based on fuzzy clustering [J].
Askari, S. ;
Montazerin, N. .
EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (04) :2121-2135