Using Particle Swarm Optimisation and the Silhouette Metric to Estimate the Number of Clusters, Select Features, and Perform Clustering

被引:16
作者
Lensen, Andrew [1 ]
Xue, Bing [1 ]
Zhang, Mengjie [1 ]
机构
[1] Victoria Univ Wellington, Sch Engn & Comp Sci, POB 600, Wellington 6140, New Zealand
来源
APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2017, PT I | 2017年 / 10199卷
关键词
Particle swarm optimisation; Clustering; Feature selection; Automatic clustering; Silhouette;
D O I
10.1007/978-3-319-55849-3_35
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
One of the most difficult problems in clustering, the task of grouping similar instances in a dataset, is automatically determining the number of clusters that should be created. When a dataset has a large number of attributes (features), this task becomes even more difficult due to the relationship between the number of features and the number of clusters produced. One method of addressing this is feature selection, the process of selecting a subset of features to be used. Evolutionary computation techniques have been used very effectively for solving clustering problems, but have seen little use for simultaneously performing the three tasks of clustering, feature selection, and determining the number of clusters. Furthermore, only a small number of existing methods exist, but they have a number of limitations that affect their performance and scalability. In this work, we introduce a number of novel techniques for improving the performance of these three tasks using particle swarm optimisation and statistical techniques. We conduct a series of experiments across a range of datasets with clustering problems of varying difficulty. The results show our proposed methods achieve significantly better clustering performance than existing methods, while only using a small number of features and automatically determining the number of clusters more accurately.
引用
收藏
页码:538 / 554
页数:17
相关论文
共 15 条
  • [1] Aggarwal CC, 2014, CH CRC DATA MIN KNOW, P1
  • [2] Alelyani S, 2014, CH CRC DATA MIN KNOW, P29
  • [3] [Anonymous], 2006, THESIS U PRETORIA S
  • [4] Intelligent Choice of the Number of Clusters in K-Means Clustering: An Experimental Study with Different Cluster Spreads
    Chiang, Mark Ming-Tso
    Mirkin, Boris
    [J]. JOURNAL OF CLASSIFICATION, 2010, 27 (01) : 3 - 40
  • [5] Guyon I., 2003, Journal of Machine Learning Research, V3, P1157, DOI 10.1162/153244303322753616
  • [6] An evolutionary approach to multiobjective clustering
    Handl, Julia
    Knowles, Joshua
    [J]. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2007, 11 (01) : 56 - 76
  • [7] Javani M., 2011, 2011 International Symposium on Artificial Intelligence and Signal Processing (AISP), P71, DOI 10.1109/AISP.2011.5960988
  • [8] Automatic clustering using nature-inspired metaheuristics: A survey
    Jose-Garcia, Adan
    Gomez-Flores, Wilfrido
    [J]. APPLIED SOFT COMPUTING, 2016, 41 : 192 - 213
  • [9] Lensen A., 2016, P S SER COM IN PRESS
  • [10] LICHMAN M., 2013, UCI MACHINE LEARNING