Towards Parameter-Free Clustering for Real-World Data

被引:12
作者
Hou, Jian [1 ]
Yuan, Huaqiang [1 ]
Pelillo, Marcello [2 ,3 ]
机构
[1] Dongguan Univ Technol, Sch Comp Sci & Technol, Dongguan 523808, Peoples R China
[2] Ca Foscari Univ, DAIS, I-30172 Venice, Italy
[3] Ca Foscari Univ, European Ctr Living Technol, I-30123 Venice, Italy
关键词
Clustering; Real-world data; Dominant set; Density peak; K-MEANS; ROBUST; CONVERGENCE;
D O I
10.1016/j.patcog.2022.109062
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While many clustering algorithms have been published, existing algorithms are often afflicted by some problems in processing real-world data. We present an algorithm to deal with two of these problems in this paper. First, the majority of clustering algorithms depend on one or more parameters. Second, some algorithms are not suitable for clusters of Gaussian distribution, whereas clusters of many real datasets are of Gaussian distribution approximately. Our algorithm generates clusters sequentially, and each cluster is obtained by expanding an initial cluster. The initial cluster is extracted with the dominant set algorithm, and we study the correlation between the pairwise data similarity matrix and clustering result to determine the involved scaling parameter adaptively. In expanding the initial cluster, we improve the density peak algorithm so that the expansion will not cross the boundary between two clusters, and the involved density parameter has little influence on clustering results. In our algorithm, the cluster expansion enables our algorithm to work well with clusters of Gaussian distribution, and two involved parameters can be fixed or determined adaptively. Our algorithm goes a step forward in parameter-free clustering for real-world data, and it is shown to perform better than or comparably to some commonly used algorithms with parameters in experiments with synthetic datasets composed of Gaussian clusters and real datasets. (c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:15
相关论文
共 35 条
  • [1] DenMune: Density peak based clustering using mutual nearest neighbors
    Abbas, Mohamed
    El-Zoghabi, Adel
    Shoukry, Amin
    [J]. PATTERN RECOGNITION, 2021, 109
  • [2] Acharya T, 2005, IMAGE PROCESSING: PRINCIPLES AND APPLICATIONS, P1, DOI 10.1002/0471745790
  • [3] Deep Constrained Dominant Sets for Person Re-Identification
    Alemu, Leulseged Tesfaye
    Pelillo, Marcello
    Shah, Mubarak
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9854 - 9863
  • [4] Fast density clustering strategies based on the k-means algorithm
    Bai, Liang
    Cheng, Xueqi
    Liang, Jiye
    Shen, Huawei
    Guo, Yike
    [J]. PATTERN RECOGNITION, 2017, 71 : 375 - 386
  • [5] Graph-based quadratic optimization: A fast evolutionary approach
    Bulo, Samuel Rota
    Pelillo, Marcello
    Bomze, Immanuel M.
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2011, 115 (07) : 984 - 995
  • [6] Enhanced Balanced Min Cut
    Chen, Xiaojun
    Hong, Weijun
    Nie, Feiping
    Huang, Joshua Zhexue
    Shen, Li
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2020, 128 (07) : 1982 - 1995
  • [7] Mean shift: A robust approach toward feature space analysis
    Comaniciu, D
    Meer, P
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (05) : 603 - 619
  • [8] Ester M., 1996, P 2 INT C KNOWL DISC, P226, DOI DOI 10.5555/3001460.3001507
  • [9] How much can k-means be improved by using better initialization and repeats?
    Franti, Pasi
    Sieranoja, Sami
    [J]. PATTERN RECOGNITION, 2019, 93 : 95 - 112
  • [10] Clustering by passing messages between data points
    Frey, Brendan J.
    Dueck, Delbert
    [J]. SCIENCE, 2007, 315 (5814) : 972 - 976