Density peaks clustering based on k-nearest neighbors and self-recommendation

被引:34
作者
Sun, Lin [1 ]
Qin, Xiaoying [1 ]
Ding, Weiping [2 ]
Xu, Jiucheng [1 ]
Zhang, Shiguang [1 ]
机构
[1] Henan Normal Univ, Coll Comp & Informat Engn, Xinxiang 453007, Henan, Peoples R China
[2] Nantong Univ, Sch Informat Sci & Technol, Nantong 226019, Peoples R China
基金
中国国家自然科学基金;
关键词
Density peaks clustering; Microcluster; Neighbourhood; Local center; Self-recommendation strategy; ALGORITHM; INFORMATION;
D O I
10.1007/s13042-021-01284-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Density peaks clustering (DPC) model focuses on searching density peaks and clustering data with arbitrary shapes for machine learning. However, it is difficult for DPC to select a cut-off distance in the calculation of a local density of points, and DPC easily ignores the cluster centers with lower density in datasets with variable densities. In addition, for clusters with complex shapes, DPC selects only one cluster center for a cluster, meaning that the structure of the whole cluster is not fully reflected. To overcome these drawbacks, this paper presents a novel DPC model that merges microclusters based on k-nearest neighbors (kNN) and self-recommendation, called DPC-MC for short. First, the kNN-based neighbourhood of point is defined and the mutual neighbour degree of point is presented in this neighbourhood, and then a new local density based on the mutual neighbour degree is proposed. This local density does not need to set the cut-off distance manually. Second, to address the artificial setting of cluster centers, a self-recommendation strategy for local centers is provided. Third, after the selection of multiple local centers, the binding degree of microclusters is developed to quantify the combination degree between a microcluster and its neighbour clusters. After that, homogeneous clusters are found according to the binding degree of microclusters during the process of deleting boundary points layer by layer. The homologous clusters are merged, the points in the abnormal clusters are reallocated, and then the clustering process ends. Finally, the DPC-MC algorithm is designed, and nine synthetic datasets and twenty-seven real-world datasets are used to verify the effectiveness of our algorithm. The experimental results demonstrate that the presented algorithm outperforms other compared algorithms in terms of several evaluation metrics for clustering.
引用
收藏
页码:1913 / 1938
页数:26
相关论文
共 56 条
  • [1] Abbas M. A., 2012, 2012 11th International Conference on Information Sciences, Signal Processing and their Applications (ISSPA), P1192, DOI 10.1109/ISSPA.2012.6310472
  • [2] Density-based clustering using approximate natural neighbours
    Angelova, Maia
    Beliakov, Gleb
    Zhu, Ye
    [J]. APPLIED SOFT COMPUTING, 2019, 85
  • [3] Ankerst M., 1999, SIGMOD Record, V28, P49, DOI 10.1145/304181.304187
  • [4] Fast density clustering strategies based on the k-means algorithm
    Bai, Liang
    Cheng, Xueqi
    Liang, Jiye
    Shen, Huawei
    Guo, Yike
    [J]. PATTERN RECOGNITION, 2017, 71 : 375 - 386
  • [5] FCM - THE FUZZY C-MEANS CLUSTERING-ALGORITHM
    BEZDEK, JC
    EHRLICH, R
    FULL, W
    [J]. COMPUTERS & GEOSCIENCES, 1984, 10 (2-3) : 191 - 203
  • [6] FGCH: a fast and grid based clustering algorithm for hybrid data stream
    Chen, Jinyin
    Lin, Xiang
    Xuan, Qi
    Xiang, Yun
    [J]. APPLIED INTELLIGENCE, 2019, 49 (04) : 1228 - 1244
  • [7] Fast density peak clustering for large scale data based on kNN
    Chen, Yewang
    Hu, Xiaoliang
    Fan, Wentao
    Shen, Lianlian
    Zhang, Zheng
    Liu, Xin
    Du, Jixiang
    Li, Haibo
    Chen, Yi
    Li, Hailin
    [J]. KNOWLEDGE-BASED SYSTEMS, 2020, 187
  • [8] Decentralized Clustering by Finding Loose and Distributed Density Cores
    Chen, Yewang
    Tang, Shengyu
    Zhou, Lida
    Wang, Cheng
    Du, Jixiang
    Wang, Tian
    Pei, Songwen
    [J]. INFORMATION SCIENCES, 2018, 433 : 510 - 526
  • [9] Dense members of local cores-based density peaks clustering algorithm
    Cheng, Dongdong
    Zhang, Sulan
    Huang, Jinlong
    [J]. KNOWLEDGE-BASED SYSTEMS, 2020, 193
  • [10] Mean shift: A robust approach toward feature space analysis
    Comaniciu, D
    Meer, P
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (05) : 603 - 619