Partition-and-merge based fuzzy genetic clustering algorithm for categorical data

被引:23
作者
Thi Phuong Quyen Nguyen [1 ]
Kuo, R. J. [1 ]
机构
[1] Natl Taiwan Univ Sci & Technol, Dept Ind Management, 43,Sect 4,Keelung Rd, Taipei, Taiwan
关键词
Categorical data; Fuzzy clustering; Genetic algorithm; Partition-and-merge; K-MODES ALGORITHM; ROCK;
D O I
10.1016/j.asoc.2018.11.028
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Categorical data clustering is a difficult and challenging task due to the special characteristic of categorical attributes: no natural order. Thus, this study aims to propose a two-stage method named partition-and-merge based fuzzy genetic clustering algorithm (PM-FGCA) for categorical data. The proposed PM-FGCA uses a fuzzy genetic clustering algorithm to partition the dataset into a maximum number of clusters in the first stage. Then, the merge stage is designed to select two clusters among the clusters that generated in the first stage based on its inter-cluster distances and merge two selected clusters to one cluster. This procedure is repeated until the number of clusters equals to the predetermined number of clusters. Thereafter, some particular instances in each cluster are considered to be re-assigned to other clusters based on the intra-cluster distances. The proposed PM-FGCA is implemented on ten categorical datasets from UCI machine learning repository. In order to evaluate the clustering performance, the proposed PM-FGCA is compared with some existing methods such as k-modes algorithm, fuzzy k-modes algorithm, genetic fuzzy k-modes algorithm, and non-dominated sorting genetic algorithm using fuzzy membership chromosomes. Adjusted Ranked Index (ARI), Normalized Mutual Information (NMI), and Davies-Bouldin (DB) index are selected as three clustering validation indices which are represented to both external index (i.e., ARI and NMI) and internal index (i.e., DB). Consequently, the experimental result shows that the proposed PM-FGCA outperforms the benchmark methods in terms of the tested indices. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:254 / 264
页数:11
相关论文
共 47 条
[1]   A hierarchical clustering algorithm for categorical attributes [J].
Agarwal, Parul ;
Alam, M. Afshar ;
Biswas, Ranjit .
2010 SECOND INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND APPLICATIONS: ICCEA 2010, PROCEEDINGS, VOL 2, 2010, :365-368
[2]   Efficient layered density-based clustering of categorical data [J].
Andreopoulos, Bill ;
An, Aijun ;
Wang, Xiaogang ;
Labudde, Dirk .
JOURNAL OF BIOMEDICAL INFORMATICS, 2009, 42 (02) :365-376
[3]  
Andritsos P, 2004, LECT NOTES COMPUT SC, V2992, P123
[4]  
[Anonymous], 2004, ACM SIGKDD EXPLOR NE
[5]  
Barbara D., 2002, Proceedings of the Eleventh International Conference on Information and Knowledge Management. CIKM 2002, P582, DOI 10.1145/584792.584888
[6]   Some new indexes of cluster validity [J].
Bezdek, JC ;
Pal, NR .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 1998, 28 (03) :301-315
[7]  
Bhagat P. M., 2013, INT J ENG ADV TECHNO, V3, P341
[8]  
Boriah S, 2008, SIAM INT C DAT MIN, P243, DOI [DOI 10.1137/1.9781611972788.22, 10.1137/1.9781611972788.22]
[9]   A fuzzy SV-k-modes algorithm for clustering categorical data with set-valued attributes [J].
Cao, Fuyuan ;
Huang, Joshua Zhexue ;
Liang, Jiye .
APPLIED MATHEMATICS AND COMPUTATION, 2017, 295 :1-15
[10]   A weighting k-modes algorithm for subspace clustering of categorical data [J].
Cao, Fuyuan ;
Liang, Jiye ;
Li, Deyu ;
Zhao, Xingwang .
NEUROCOMPUTING, 2013, 108 :23-30