A novel density peaks clustering algorithm based on Hopkins statistic

被引：15

作者：

Zhang, Ruilin ^{[1
]}

Miao, Zhenguo ^{[1
]}

Tian, Ye ^{[1
]}

Wang, Hongpeng ^{[1
,2
]}

机构：

[1] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen, Peoples R China

[2] Peng Cheng Lab, Shenzhen, Peoples R China

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2022年 / 201卷

关键词：

Clustering; Cluster validity index (CVI); Cluster center; Hopkins statistic; Density peaks; FAST SEARCH; NUMBER; FIND;

D O I：

10.1016/j.eswa.2022.116892

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Density peaks clustering (DPC) is a promising algorithm due to straightforward and easy implementation. However, most of its improvements still rely on expert, strong prior information, or complex iterations to identify the cluster centers, which inevitably adds subjectivity and instability. Moreover, some crisp and sensitive density metrics will sometimes reduce the representativeness of the center, resulting in poor clustering. To this end, we propose an enhanced algorithm, called Density peaks clustering based on Hopkins Statistic. The main property of the method is to realize the automatic identification of cluster centers without prior information. Specifically, with a two-stage strategy, we first specify some objects as candidate centers by linear regression and residual analysis. Subsequently, inspired by optimization idea we design a novel validity index (AHS) instead of the original decision graph to find the desired centers from the candidates. Another novel part of DPC-AHS is that the proposed adjusted-k-nearest neighbors (A-kNN) dynamically defines the neighbors during the process, which further enhances the robustness against outliers. Finally, we compare performance of DPC-AHS with 7 state-of-the-art methods over synthetic, UCI, and image datasets. Experiments on 25 datasets and in-depth discussion cases from 5 perspectives demonstrate that our algorithm is feasible and effective in clustering and center identification.

引用

页数：18

共 49 条

[1] Border-Peeling Clustering
Averbuch-Elor, Hadar
Bar, Nadav
Cohen-Or, Daniel
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (07) : 1791 - 1797
[2] Fuzzy Density Peaks Clustering
Bian, Zekang
Chung, Fu-Lai
Wang, Shitong
[J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2021, 29 (07) : 1725 - 1738
[3] Multidimensional Balance-Based Cluster Boundary Detection for High-Dimensional Data
Cao, Xiaofeng
Qiu, Baozhi
Li, Xiangli
Shi, Zenglin
Xu, Guandong
Xu, Jianliang
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (06) : 1867 - 1880
[4] A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data
Chen, Jin-Yin
He, Hui-Hao
[J]. INFORMATION SCIENCES, 2016, 345 : 271 - 293
[5] A novel cluster center fast determination clustering algorithm
Chen Jinyin
Lin Xiang
Zheng Haibing
Bao Xintong
[J]. APPLIED SOFT COMPUTING, 2017, 57 : 539 - 555
[6] Effectively clustering by finding density backbone based-on kNN
Chen, Mei
Li, Longjie
Wang, Bo
Cheng, Jianjun
Pan, Lina
Chen, Xiaoyun
[J]. PATTERN RECOGNITION, 2016, 60 : 486 - 498
[7] BLOCK-DBSCAN: Fast clustering for large scale data
Chen, Yewang
Zhou, Lida
Bouguila, Nizar
Wang, Cheng
Chen, Yi
Du, Jixiang
[J]. PATTERN RECOGNITION, 2021, 109
[8] Demsar J, 2006, J MACH LEARN RES, V7, P1
[9] An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood
Ding, Shifei
Du, Mingjing
Sun, Tongfeng
Xu, Xiao
Xue, Yu
[J]. KNOWLEDGE-BASED SYSTEMS, 2017, 133 : 294 - 313
[10] Study on density peaks clustering based on k-nearest neighbors and principal component analysis
Du, Mingjing
Ding, Shifei
Jia, Hongjie
[J]. KNOWLEDGE-BASED SYSTEMS, 2016, 99 : 135 - 145

← 1 2 3 4 5 →