Automatic Fuzzy Clustering Using Non-Dominated Sorting Particle Swarm Optimization Algorithm for Categorical Data

被引:13
作者
Thi Phuong Quyen Nguyen [1 ]
Kuo, R. J. [1 ]
机构
[1] Natl Taiwan Univ Sci & Technol, Dept Ind Management, Taipei 106, Taiwan
关键词
Automatic clustering; categorical data; local density; NSPSO; GENETIC ALGORITHM; K-MEANS; C-MEANS;
D O I
10.1109/ACCESS.2019.2927593
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Categorical data clustering has been attracted a lot of attention recently due to its necessary in the real-world applications. Many clustering methods have been proposed for categorical data. However, most of the existing algorithms require the predefined number of clusters which is usually unavailable in real-world problems. Only a few works focused on automatic clustering, but mainly handled for numerical data. This study develops a novel automatic fuzzy clustering using non-dominated sorting particle swarm optimization (AFC-NSPSO) algorithm for categorical data. The proposed AFC-NSPSO algorithm can automatically identify the optimal number of clusters and exploit the clustering result with the corresponding selected number of clusters. In addition, a new technique is investigated to identify the maximum number of clusters in a dataset based on the local density. To select a final solution in the first Pareto front, some internal validation indices are used. The performance of the proposed AFC-NSPSO on the real-world datasets collected from the UCI machine learning repository exhibits effectiveness compared with some other existing automatic categorical clustering algorithms. Besides, this study also applies the proposed algorithm to analyze a real-world case study with an unknown number of clusters.
引用
收藏
页码:99721 / 99734
页数:14
相关论文
共 48 条
[1]   An extensive comparative study of cluster validity indices [J].
Arbelaitz, Olatz ;
Gurrutxaga, Ibai ;
Muguerza, Javier ;
Perez, Jesus M. ;
Perona, Inigo .
PATTERN RECOGNITION, 2013, 46 (01) :243-256
[2]   INTUITIONISTIC FUZZY-SETS [J].
ATANASSOV, KT .
FUZZY SETS AND SYSTEMS, 1986, 20 (01) :87-96
[3]  
Ball G.H., 1965, ISODATA NOVEL METHOD
[4]   Nonparametric genetic clustering: Comparison of validity indices [J].
Bandyopadhyay, S ;
Maulik, U .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2001, 31 (01) :120-125
[5]   FCM - THE FUZZY C-MEANS CLUSTERING-ALGORITHM [J].
BEZDEK, JC ;
EHRLICH, R ;
FULL, W .
COMPUTERS & GEOSCIENCES, 1984, 10 (2-3) :191-203
[6]   Clustering categorical data in projected spaces [J].
Bouguessa, Mohamed .
DATA MINING AND KNOWLEDGE DISCOVERY, 2015, 29 (01) :3-38
[7]   A new initialization method for categorical data clustering [J].
Cao, Fuyuan ;
Liang, Jiye ;
Bai, Liang .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (07) :10223-10228
[8]   Top-down parameter-free clustering of high-dimensional categorical data [J].
Cesario, Eugenio ;
Manco, Giuseppe ;
Ortale, Riccardo .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (12) :1607-1624
[9]   "Best K": critical clustering structures in categorical datasets [J].
Chen, Keke ;
Liu, Ling .
KNOWLEDGE AND INFORMATION SYSTEMS, 2009, 20 (01) :1-33
[10]   Automatic kernel clustering with a Multi-Elitist Particle Swarm Optimization Algorithm [J].
Das, Swagatam ;
Abraham, Ajith ;
Konar, Amit .
PATTERN RECOGNITION LETTERS, 2008, 29 (05) :688-699