Density peaks clustering based on k-nearest neighbors and self-recommendation

被引:34
作者
Sun, Lin [1 ]
Qin, Xiaoying [1 ]
Ding, Weiping [2 ]
Xu, Jiucheng [1 ]
Zhang, Shiguang [1 ]
机构
[1] Henan Normal Univ, Coll Comp & Informat Engn, Xinxiang 453007, Henan, Peoples R China
[2] Nantong Univ, Sch Informat Sci & Technol, Nantong 226019, Peoples R China
基金
中国国家自然科学基金;
关键词
Density peaks clustering; Microcluster; Neighbourhood; Local center; Self-recommendation strategy; ALGORITHM; INFORMATION;
D O I
10.1007/s13042-021-01284-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Density peaks clustering (DPC) model focuses on searching density peaks and clustering data with arbitrary shapes for machine learning. However, it is difficult for DPC to select a cut-off distance in the calculation of a local density of points, and DPC easily ignores the cluster centers with lower density in datasets with variable densities. In addition, for clusters with complex shapes, DPC selects only one cluster center for a cluster, meaning that the structure of the whole cluster is not fully reflected. To overcome these drawbacks, this paper presents a novel DPC model that merges microclusters based on k-nearest neighbors (kNN) and self-recommendation, called DPC-MC for short. First, the kNN-based neighbourhood of point is defined and the mutual neighbour degree of point is presented in this neighbourhood, and then a new local density based on the mutual neighbour degree is proposed. This local density does not need to set the cut-off distance manually. Second, to address the artificial setting of cluster centers, a self-recommendation strategy for local centers is provided. Third, after the selection of multiple local centers, the binding degree of microclusters is developed to quantify the combination degree between a microcluster and its neighbour clusters. After that, homogeneous clusters are found according to the binding degree of microclusters during the process of deleting boundary points layer by layer. The homologous clusters are merged, the points in the abnormal clusters are reallocated, and then the clustering process ends. Finally, the DPC-MC algorithm is designed, and nine synthetic datasets and twenty-seven real-world datasets are used to verify the effectiveness of our algorithm. The experimental results demonstrate that the presented algorithm outperforms other compared algorithms in terms of several evaluation metrics for clustering.
引用
收藏
页码:1913 / 1938
页数:26
相关论文
共 56 条
  • [31] Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy
    Liu Yaohui
    Ma Zhengming
    Yu Fang
    [J]. KNOWLEDGE-BASED SYSTEMS, 2017, 133 : 208 - 220
  • [32] Lotfi A, 2016, 2016 6TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), P263, DOI 10.1109/ICCKE.2016.7802150
  • [33] MacQueen J, 1967, P 5 BERK S MATH STAT, V1, P281, DOI DOI 10.1007/S11665-016-2173-6
  • [34] McLachlan Geoffrey J, 2004, FINITE MIXTURE MODEL
  • [35] REDPC: A residual error-based density peak clustering algorithm
    Parmar, Milan
    Wang, Di
    Zhang, Xiaofeng
    Tan, Ah-Hwee
    Miao, Chunyan
    Jiang, Jianhua
    Zhou, You
    [J]. NEUROCOMPUTING, 2019, 348 : 82 - 96
  • [37] Clustering by fast search and find of density peaks
    Rodriguez, Alex
    Laio, Alessandro
    [J]. SCIENCE, 2014, 344 (6191) : 1492 - 1496
  • [38] Dynamic graph-based label propagation for density peaks clustering
    Seyedi, Seyed Amjad
    Lotfi, Abdulrahman
    Moradi, Parham
    Qader, Nooruldeen Nasih
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2019, 115 : 314 - 328
  • [39] Normalized cuts and image segmentation
    Shi, JB
    Malik, J
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2000, 22 (08) : 888 - 905
  • [40] Fast and general density peaks clustering
    Sieranoja, Sami
    Franti, Pasi
    [J]. PATTERN RECOGNITION LETTERS, 2019, 128 : 551 - 558