Study on density peaks clustering based on k-nearest neighbors and principal component analysis

被引:371
作者
Du, Mingjing [1 ,2 ]
Ding, Shifei [1 ,2 ]
Jia, Hongjie [1 ,2 ]
机构
[1] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou 221116, Peoples R China
[2] Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100090, Peoples R China
基金
中国国家自然科学基金;
关键词
Data clustering; Density peaks; k Nearest neighbors (KNN); Principal component analysis (PCA); ALGORITHM; SEARCH;
D O I
10.1016/j.knosys.2016.02.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Density peaks clustering (DPC) algorithm published in the US journal Science in 2014 is a novel clustering algorithm based on density. It needs neither iterative process nor more parameters. However, original algorithm only has taken into account the global structure of data, which leads to missing many clusters. In addition, DPC does not perform well when data sets have relatively high dimension. Especially, DPC generates wrong number of clusters of real-world data sets. In order to overcome the first problem, we propose a density peaks clustering based on k nearest neighbors (DPC-KNN) which introduces the idea of k nearest neighbors (KNN) into DPC and has another option for the local density computation. In order to overcome the second problem, we introduce principal component analysis (PCA) into the model of DPC-KNN and further bring forward a method based on PCA (DPC-KNN-PCA), which preprocesses high dimensional data. By experiments on synthetic data sets, we demonstrate the feasibility of our algorithms. By experiments on real-world data sets, we compared this algorithm with k-means algorithm and spectral clustering (SC) algorithm in accuracy. Experimental results show that our algorithms are feasible and effective. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:135 / 145
页数:11
相关论文
共 34 条
  • [1] Cartification: A Neighborhood Preserving Transformation for Mining High Dimensional Data
    Aksehirli, Emin
    Goethals, Bart
    Mueller, Emmanuel
    Vreeken, Jilles
    [J]. 2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2013, : 937 - 942
  • [2] Ankerst M, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P49
  • [3] [Anonymous], 2004, SIGKDD EXPLOR, DOI DOI 10.1145/1007730.1007731
  • [4] [Anonymous], 2002, A20026 U JOENS DEP C
  • [5] [Anonymous], 2007, ACM Transactions on Knowledge Discovery from Data, DOI [DOI 10.1145/1217299.1217303, 10.1145/1217299.1217303]
  • [6] [Anonymous], BMC BIOINF
  • [7] [Anonymous], 1998, COMBINATORIAL OPTIMI
  • [8] Towards enriching the quality of k-nearest neighbor rule for document classification
    Basu, Tanmay
    Murthy, C. A.
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2014, 5 (06) : 897 - 905
  • [9] MULTIDIMENSIONAL BINARY SEARCH TREES USED FOR ASSOCIATIVE SEARCHING
    BENTLEY, JL
    [J]. COMMUNICATIONS OF THE ACM, 1975, 18 (09) : 509 - 517
  • [10] Campello Ricardo J. G. B., 2013, Advances in Knowledge Discovery and Data Mining. 17th Pacific-Asia Conference (PAKDD 2013). Proceedings, P160, DOI 10.1007/978-3-642-37456-2_14