Natural neighbor-based clustering algorithm with local representatives

被引:59
作者
Cheng, Dongdong [1 ]
Zhu, Qingsheng [1 ]
Huang, Jinlong [1 ]
Yang, Lijun [1 ]
Wu, Quanwang [1 ]
机构
[1] Chongqing Univ, Coll Comp Sci, Chongqing Key Lab Software Theory & Technol, Chongqing 400044, Peoples R China
基金
中国国家自然科学基金;
关键词
Clustering; Natural neighbor; Local representatives; DENSITY; SEARCH;
D O I
10.1016/j.knosys.2017.02.027
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering by identifying cluster centers is important for detecting patterns in a data set. However, many center-based clustering algorithms cannot process data sets containing non-spherical clusters. In this paper, we propose a novel clustering algorithm called NaNLORE based on natural neighbor and local representatives. Natural neighbor is a new neighbor concept and introduced to compute local density and find local representatives which are points with local maximum density. We first find local representatives and then select cluster centers from the local representatives. The density-adaptive distance is introduced to measure the distance between local representatives, which helps to solve the problem of clustering data sets with complex manifold structure. Cluster centers are characterized by higher density than their neighbors and a relatively large density-adaptive distance from any local representatives with higher density. In experiments, we compare the proposed algorithm NaNLORE with existing algorithms on synthetic and real data sets. Results show that NaNLORE performs better than existing algorithm, especially on clustering non-spherical data and manifold data. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:238 / 253
页数:16
相关论文
共 40 条
[1]  
[Anonymous], 2009, Finding Groups in Data: An Introduction to Cluster Analysis
[2]   MULTIDIMENSIONAL BINARY SEARCH TREES USED FOR ASSOCIATIVE SEARCHING [J].
BENTLEY, JL .
COMMUNICATIONS OF THE ACM, 1975, 18 (09) :509-517
[3]   Study on density peaks clustering based on k-nearest neighbors and principal component analysis [J].
Du, Mingjing ;
Ding, Shifei ;
Jia, Hongjie .
KNOWLEDGE-BASED SYSTEMS, 2016, 99 :135-145
[4]   Sparse Subspace Clustering: Algorithm, Theory, and Applications [J].
Elhamifar, Ehsan ;
Vidal, Rene .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (11) :2765-2781
[5]  
Ester M, 1996, P 2 INT C KNOWLEDGE, DOI DOI 10.5555/3001460.3001507
[6]   Clustering algorithm selection by meta-learning systems: A new distance-based problem characterization and ranking combination methods [J].
Ferrari, Daniel Gomes ;
de Castro, Leandro Nunes .
INFORMATION SCIENCES, 2015, 301 :181-194
[7]   Iterative shrinking method for clustering problems [J].
Fränti, P ;
Virmajoki, I .
PATTERN RECOGNITION, 2006, 39 (05) :761-775
[8]   Clustering by passing messages between data points [J].
Frey, Brendan J. ;
Dueck, Delbert .
SCIENCE, 2007, 315 (5814) :972-976
[9]  
Guha S., 1998, SIGMOD Record, V27, P73, DOI 10.1145/276305.276312
[10]   ROCK: A robust clustering algorithm for categorical attributes [J].
Guha, S ;
Rastogi, R ;
Shim, K .
15TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 1999, :512-521