Finding compact and well-separated clusters: Clustering using silhouette coefficients

被引:100
作者
Bagirov, Adil M. [1 ]
Aliguliyev, Ramiz M. [2 ]
Sultanova, Nargiz [1 ]
机构
[1] Federat Univ Australia, Sch Engn Informat Technol & Phys Sci, Univ Dr, Mount Helen, Vic 3353, Australia
[2] Azerbaijan Natl Acad Sci, Inst Informat Technol, 9A B Vahabzade Str, AZ-1141 Baku, Azerbaijan
基金
澳大利亚研究理事会;
关键词
Cluster analysis; Cluster validity index; Silhouette coefficients; Nonsmooth optimization; Incremental algorithm; VALIDITY INDEX; ALGORITHM; NUMBER;
D O I
10.1016/j.patcog.2022.109144
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Finding compact and well-separated clusters in data sets is a challenging task. Most clustering algorithms try to minimize certain clustering objective functions. These functions usually reflect the intra-cluster similarity and inter-cluster dissimilarity. However, the use of such functions alone may not lead to the finding of well-separated and, in some cases, compact clusters. Therefore additional measures, called clus-ter validity indices, are used to estimate the true number of well-separated and compact clusters. Some of these indices are well-suited to be included into the optimization model of the clustering problem. Silhouette coefficients are among such indices. In this paper, a new optimization model of the clustering problem is developed where the clustering function is used as an objective and silhouette coefficients are used to formulate constraints. Then an algorithm, called CLUSCO (CLustering Using Silhouette COeffi-cients), is designed to construct clusters incrementally. Three schemes are discussed to reduce the com-putational complexity of the algorithm. Its performance is evaluated using fourteen real-world data sets and compared with that of three state-of-the-art clustering algorithms. Results show that the CLUSCO is able to compute compact clusters which are significantly better separable in comparison with those obtained by other algorithms.(c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:15
相关论文
共 32 条
[1]   A new validity clustering index-based on finding new centroid positions using the mean of clustered data to determine the optimum number of clusters [J].
Abdalameer, Ahmed Khaldoon ;
Alswaitti, Mohammed ;
Alsudani, Ahmed Adnan ;
Isa, Nor Ashidi Mat .
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 191
[2]   An extensive comparative study of cluster validity indices [J].
Arbelaitz, Olatz ;
Gurrutxaga, Ibai ;
Muguerza, Javier ;
Perez, Jesus M. ;
Perona, Inigo .
PATTERN RECOGNITION, 2013, 46 (01) :243-256
[3]  
Arthur D., 2007, ACM SIAM S DISCRETE
[4]  
Bagirov A., 2014, Introduction to Nonsmooth Optimization, DOI DOI 10.1007/978-3-319-08114-4
[5]   Discrete gradient method:: Derivative-free method for nonsmooth optimization [J].
Bagirov, A. M. ;
Karasoezen, B. ;
Sezer, M. .
JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2008, 137 (02) :317-334
[6]  
Bagirov A.M., 2020, Numerical Nonsmooth Optimization, DOI [10.1007/978-3-030-34910-3, DOI 10.1007/978-3-030-34910-3]
[7]   Modified global k-means algorithm for minimum sum-of-squares clustering problems [J].
Bagirov, Adil M. .
PATTERN RECOGNITION, 2008, 41 (10) :3192-3199
[8]   Piecewise partially separable functions and a derivative-free algorithm for large scale nonsmooth optimization [J].
Bagirov, Adil M. ;
Ugon, Julien .
JOURNAL OF GLOBAL OPTIMIZATION, 2006, 35 (02) :163-195
[9]   A new nonsmooth optimization algorithm for minimum sum-of-squares clustering problems [J].
Bagirov, AM ;
Yearwood, J .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2006, 170 (02) :578-596
[10]  
Bagirov AM., 2020, PARTITIONAL CLUSTERI