Fast computation of cluster validity measures for bregman divergences and benefits

被引:4
作者
Capo, Marco [1 ]
Perez, Aritz [1 ]
Lozano, Jose A. [1 ,2 ]
机构
[1] Basque Ctr Appl Math, Bilbao 48009, Spain
[2] Univ Basque Country, Dept Comp Sci & Artifitial Intelligence, Intelligent Syst Grp, UPV EHU, San Sebastian 20018, Spain
关键词
Partitional clustering; Number of clusters; Silhouette index; Davies-Bouldin; Calinski-Harabasz; Bregman divergences;
D O I
10.1016/j.patrec.2023.05.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Partitional clustering is one of the most relevant unsupervised learning and pattern recognition tech-niques. Unfortunately, one of the main drawbacks of these methodologies refer to the fact that the num-ber of clusters is generally assumed to be known beforehand and automating its selection is not straight-forward. On the same token, internal validity measures, such as the Silhouette index, Davies-Bouldin and Caliski-Harabasz measures have emerged as the standard techniques to be used when comparing the goodness of clustering results obtained via different clustering methods. These measures take into con-sideration both the inter and intra-cluster simmilarities and can be adapted to different metrics. Unfortu-nately, their used has been hindered due to their large computational complexities, which are commonly quadratic with respect to the number of instances of the data set. In this work, we show that the time complexity of computing the most popular internal validity measures can be utterly reduced by making used of the within-cluster errors and different properties of the Bregman divergences. This contribution ultimately allows us to massively speed-up the selection of an adequate number of clusters for a given data set as verified with extensive empirical comparisons.(c) 2023 Published by Elsevier B.V.
引用
收藏
页码:100 / 105
页数:6
相关论文
共 31 条
  • [11] DISTORTION MEASURES FOR SPEECH PROCESSING
    GRAY, RM
    BUZO, A
    GRAY, AH
    MATSUYAMA, Y
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04): : 367 - 376
  • [12] Comparison of internal clustering validation indices for prototype-based clustering
    Hämäläinen J.
    Jauhiainen S.
    Kärkkäinen T.
    [J]. Algorithms, 2017, 10 (03):
  • [13] Hamerly G, 2004, ADV NEUR IN, V16, P281
  • [14] Han J., 2011, MORGAN KAUFMAN SER D
  • [15] Data clustering: A review
    Jain, AK
    Murty, MN
    Flynn, PJ
    [J]. ACM COMPUTING SURVEYS, 1999, 31 (03) : 264 - 323
  • [16] Data clustering: 50 years beyond K-means
    Jain, Anil K.
    [J]. PATTERN RECOGNITION LETTERS, 2010, 31 (08) : 651 - 666
  • [17] Kaufman L., 1987, Statistical Data Analysis Based on the L1-Norm and Related Methods. First International Conference, P405
  • [18] Kaufman L., 1986, Pattern recognition in practice, P425, DOI DOI 10.1016/B978-0-444-87877-9.50039-X
  • [19] Using Particle Swarm Optimisation and the Silhouette Metric to Estimate the Number of Clusters, Select Features, and Perform Clustering
    Lensen, Andrew
    Xue, Bing
    Zhang, Mengjie
    [J]. APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2017, PT I, 2017, 10199 : 538 - 554
  • [20] LLOYD SP, 1982, IEEE T INFORM THEORY, V28, P129, DOI 10.1109/TIT.1982.1056489