A novel density peaks clustering algorithm based on Hopkins statistic

被引:15
作者
Zhang, Ruilin [1 ]
Miao, Zhenguo [1 ]
Tian, Ye [1 ]
Wang, Hongpeng [1 ,2 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen, Peoples R China
[2] Peng Cheng Lab, Shenzhen, Peoples R China
关键词
Clustering; Cluster validity index (CVI); Cluster center; Hopkins statistic; Density peaks; FAST SEARCH; NUMBER; FIND;
D O I
10.1016/j.eswa.2022.116892
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Density peaks clustering (DPC) is a promising algorithm due to straightforward and easy implementation. However, most of its improvements still rely on expert, strong prior information, or complex iterations to identify the cluster centers, which inevitably adds subjectivity and instability. Moreover, some crisp and sensitive density metrics will sometimes reduce the representativeness of the center, resulting in poor clustering. To this end, we propose an enhanced algorithm, called Density peaks clustering based on Hopkins Statistic. The main property of the method is to realize the automatic identification of cluster centers without prior information. Specifically, with a two-stage strategy, we first specify some objects as candidate centers by linear regression and residual analysis. Subsequently, inspired by optimization idea we design a novel validity index (AHS) instead of the original decision graph to find the desired centers from the candidates. Another novel part of DPC-AHS is that the proposed adjusted-k-nearest neighbors (A-kNN) dynamically defines the neighbors during the process, which further enhances the robustness against outliers. Finally, we compare performance of DPC-AHS with 7 state-of-the-art methods over synthetic, UCI, and image datasets. Experiments on 25 datasets and in-depth discussion cases from 5 perspectives demonstrate that our algorithm is feasible and effective in clustering and center identification.
引用
收藏
页数:18
相关论文
共 49 条
  • [1] Border-Peeling Clustering
    Averbuch-Elor, Hadar
    Bar, Nadav
    Cohen-Or, Daniel
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (07) : 1791 - 1797
  • [2] Fuzzy Density Peaks Clustering
    Bian, Zekang
    Chung, Fu-Lai
    Wang, Shitong
    [J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2021, 29 (07) : 1725 - 1738
  • [3] Multidimensional Balance-Based Cluster Boundary Detection for High-Dimensional Data
    Cao, Xiaofeng
    Qiu, Baozhi
    Li, Xiangli
    Shi, Zenglin
    Xu, Guandong
    Xu, Jianliang
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (06) : 1867 - 1880
  • [4] A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data
    Chen, Jin-Yin
    He, Hui-Hao
    [J]. INFORMATION SCIENCES, 2016, 345 : 271 - 293
  • [5] A novel cluster center fast determination clustering algorithm
    Chen Jinyin
    Lin Xiang
    Zheng Haibing
    Bao Xintong
    [J]. APPLIED SOFT COMPUTING, 2017, 57 : 539 - 555
  • [6] Effectively clustering by finding density backbone based-on kNN
    Chen, Mei
    Li, Longjie
    Wang, Bo
    Cheng, Jianjun
    Pan, Lina
    Chen, Xiaoyun
    [J]. PATTERN RECOGNITION, 2016, 60 : 486 - 498
  • [7] BLOCK-DBSCAN: Fast clustering for large scale data
    Chen, Yewang
    Zhou, Lida
    Bouguila, Nizar
    Wang, Cheng
    Chen, Yi
    Du, Jixiang
    [J]. PATTERN RECOGNITION, 2021, 109
  • [8] Demsar J, 2006, J MACH LEARN RES, V7, P1
  • [9] An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood
    Ding, Shifei
    Du, Mingjing
    Sun, Tongfeng
    Xu, Xiao
    Xue, Yu
    [J]. KNOWLEDGE-BASED SYSTEMS, 2017, 133 : 294 - 313
  • [10] Study on density peaks clustering based on k-nearest neighbors and principal component analysis
    Du, Mingjing
    Ding, Shifei
    Jia, Hongjie
    [J]. KNOWLEDGE-BASED SYSTEMS, 2016, 99 : 135 - 145