Towards Parameter-Free Clustering for Real-World Data

被引：12

作者：

Hou, Jian ^{[1
]}

Yuan, Huaqiang ^{[1
]}

Pelillo, Marcello ^{[2
,3
]}

机构：

[1] Dongguan Univ Technol, Sch Comp Sci & Technol, Dongguan 523808, Peoples R China

[2] Ca Foscari Univ, DAIS, I-30172 Venice, Italy

[3] Ca Foscari Univ, European Ctr Living Technol, I-30123 Venice, Italy

来源：

PATTERN RECOGNITION | 2023年 / 134卷

关键词：

Clustering; Real-world data; Dominant set; Density peak; K-MEANS; ROBUST; CONVERGENCE;

D O I：

10.1016/j.patcog.2022.109062

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While many clustering algorithms have been published, existing algorithms are often afflicted by some problems in processing real-world data. We present an algorithm to deal with two of these problems in this paper. First, the majority of clustering algorithms depend on one or more parameters. Second, some algorithms are not suitable for clusters of Gaussian distribution, whereas clusters of many real datasets are of Gaussian distribution approximately. Our algorithm generates clusters sequentially, and each cluster is obtained by expanding an initial cluster. The initial cluster is extracted with the dominant set algorithm, and we study the correlation between the pairwise data similarity matrix and clustering result to determine the involved scaling parameter adaptively. In expanding the initial cluster, we improve the density peak algorithm so that the expansion will not cross the boundary between two clusters, and the involved density parameter has little influence on clustering results. In our algorithm, the cluster expansion enables our algorithm to work well with clusters of Gaussian distribution, and two involved parameters can be fixed or determined adaptively. Our algorithm goes a step forward in parameter-free clustering for real-world data, and it is shown to perform better than or comparably to some commonly used algorithms with parameters in experiments with synthetic datasets composed of Gaussian clusters and real datasets. (c) 2022 Elsevier Ltd. All rights reserved.

引用

页数：15

共 35 条

[1] DenMune: Density peak based clustering using mutual nearest neighbors
Abbas, Mohamed
El-Zoghabi, Adel
Shoukry, Amin
[J]. PATTERN RECOGNITION, 2021, 109
[2] Acharya T, 2005, IMAGE PROCESSING: PRINCIPLES AND APPLICATIONS, P1, DOI 10.1002/0471745790
[3] Deep Constrained Dominant Sets for Person Re-Identification
Alemu, Leulseged Tesfaye
Pelillo, Marcello
Shah, Mubarak
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9854 - 9863
[4] Fast density clustering strategies based on the k-means algorithm
Bai, Liang
Cheng, Xueqi
Liang, Jiye
Shen, Huawei
Guo, Yike
[J]. PATTERN RECOGNITION, 2017, 71 : 375 - 386
[5] Graph-based quadratic optimization: A fast evolutionary approach
Bulo, Samuel Rota
Pelillo, Marcello
Bomze, Immanuel M.
[J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2011, 115 (07) : 984 - 995
[6] Enhanced Balanced Min Cut
Chen, Xiaojun
Hong, Weijun
Nie, Feiping
Huang, Joshua Zhexue
Shen, Li
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2020, 128 (07) : 1982 - 1995
[7] Mean shift: A robust approach toward feature space analysis
Comaniciu, D
Meer, P
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (05) : 603 - 619
[8] Ester M., 1996, P 2 INT C KNOWL DISC, P226, DOI DOI 10.5555/3001460.3001507
[9] How much can k-means be improved by using better initialization and repeats?
Franti, Pasi
Sieranoja, Sami
[J]. PATTERN RECOGNITION, 2019, 93 : 95 - 112
[10] Clustering by passing messages between data points
Frey, Brendan J.
Dueck, Delbert
[J]. SCIENCE, 2007, 315 (5814) : 972 - 976

← 1 2 3 4 →