Efficient algorithms based on the k-means and Chaotic League Championship Algorithm for numeric, categorical, and mixed-type data clustering

被引:25
作者
Wangchamhan, Tanachapong [1 ]
Chiewchanwattana, Sirapat [1 ]
Sunat, Khamron [1 ]
机构
[1] Khon Kaen Univ, Dept Comp Sci, Fac Sci, Khon Kaen 40002, Thailand
关键词
Data clustering; Search clustering algorithm; Hybrid clustering algurtiliin; League Championship Algorithm (LCA); Chaos optimization algorithms (COA); Mixed-type data; OPTIMIZATION ALGORITHM; GLOBAL OPTIMIZATION; SEARCH;
D O I
10.1016/j.eswa.2017.08.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The success rates of the expert or intelligent systems depend on the selection of the correct data clusters. The k-means algorithm is a well-known method in solving data clustering problems. It suffers not only from a high dependency on the algorithm's initial solution but also from the used distance function. A number of algorithms have been proposed to address the centroid initialization problem, but the produced solution does not produce optimum clusters. This paper proposes three algorithms (i) the search algorithm C-LCA that is an improved League Championship Algorithm (LCA), (ii) a search clustering using C-LCA (SC-LCA), and (iii) a hybrid-clustering algorithm called the hybrid of k-means and Chaotic League Championship Algorithm (KSC-LCA) and this algorithm has of two computation stages. The C-LCA employs chaotic adaptation for the retreat and approach parameters, rather than constants, which can enhance the search capability. Furthermore, to overcome the limitation of the original k-means algorithm using the Euclidean distance that cannot handle the categorical attribute type properly, we adopt the Gower distance and the mechanism for handling a discrete value requirement of the categorical value attribute. The proposed algorithms can handle not only the pure numeric data but also the mixed-type data and can find the best centroids containing categorical values. Experiments were conducted on 14 datasets from the UCI repository. The SC-LCA and KSC-LCA competed with 16 established algorithms including the k-means, k-means++, global k-means algorithms, four search clustering algorithms and nine hybrids of k-means algorithm with several state-of-the-art evolutionary algorithms. The experimental results show that the SC-LCA produces the cluster with the highest F-Measure on the pure categorical dataset and the KSC-LCA produces the cluster with the highest F-Measure for the pure numeric and mixed-type tested datasets. Out of 14 datasets, there were 13 centroids produced by the SC-LCA that had better F-Measures than that of the k-means algorithm. On the Tic-Tac-Toe dataset containing only categorical attributes, the SC-LCA can achieve an F-Measure of 66.61 that is 21.74 points over that of the k-means algorithm (44.87). The KSC-LCA produced better centroids than k-means algorithm in all 14 datasets; the maximum F-Measure improvement was 11.59 points. However, in terms of the computational time, the SC-LCA and KSC-LCA took more NFEs than the k-means and its variants but the KSC-LCA ranks first and SC-LCA ranks fourth among the hybrid clustering and the search clustering algorithms that we tested. Therefore, the SC-LCA and KSC-LCA are general and effective clustering algorithms that could be used when an expert or intelligent system requires an accurate high-speed cluster selection. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:146 / 167
页数:22
相关论文
共 52 条
[1]   A k-mean clustering algorithm for mixed numeric and categorical data [J].
Ahmad, Amir ;
Dey, Lipika .
DATA & KNOWLEDGE ENGINEERING, 2007, 63 (02) :503-527
[2]  
Ali B. Ben, 2013, 2013 5 INT C MOD
[3]  
[Anonymous], 2015, ADV COMPUT INTELL IN
[4]  
[Anonymous], 2015, EURASIP J WIRELESS C, DOI DOI 10.1007/S00170-015-6818-7
[5]  
Arthur D., 2007, P 18 ANN ACM SIAM S, DOI DOI 10.1145/1283383.1283494
[6]   Nonlinear subspace clustering using curvature constrained distances [J].
Babaeian, Amir ;
Babaee, Mohammadreaza ;
Bayestehtashk, Alireza ;
Bandarabadi, Mojtaba .
PATTERN RECOGNITION LETTERS, 2015, 68 :118-125
[7]   Optimal power flow using the league championship algorithm: A case study of the Algerian power system [J].
Bouchekara, H. R. E. H. ;
Abido, M. A. ;
Chaib, A. E. ;
Mehasni, R. .
ENERGY CONVERSION AND MANAGEMENT, 2014, 87 :58-70
[8]   MRI brain tissue classification using unsupervised optimized extenics-based methods [J].
Chen, Ruey-Maw ;
Yang, Sheng-Chih ;
Wang, Chuin-Mu .
COMPUTERS & ELECTRICAL ENGINEERING, 2017, 58 :489-501
[9]  
Chen X., 2014, BIOMED RES INT, V2014, P1, DOI DOI 10.3109/19401736.2014.974166
[10]   Self-Adaptive Differential Evolution Algorithm With Zoning Evolution of Control Parameters and Adaptive Mutation Strategies [J].
Fan, Qinqin ;
Yan, Xuefeng .
IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (01) :219-232