RNN-DBSCAN: A Density-Based Clustering Algorithm Using Reverse Nearest Neighbor Density Estimates

被引:215
作者
Bryant, Avory [1 ,2 ]
Cios, Krzysztof [3 ,4 ]
机构
[1] Virginia Commonwealth Univ, Dept Comp Sci, Med Coll Virginia Campus, Richmond, VA 23284 USA
[2] Naval Surface Warfare Ctr Dahlgren Div, Dahlgren, VA 22448 USA
[3] Virginia Commonwealth Univ, Comp Sci Dept, Med Coll Virginia Campus, Richmond, VA 23284 USA
[4] Polish Acad Sci, PL-20290 Lublin, Poland
关键词
Unsupervised learning; pattern analysis; clustering algorithms; pattern clustering; density estimation robust algorithm; nearest neighbor searches;
D O I
10.1109/TKDE.2017.2787640
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A new density-based clustering algorithm, RNN-DBSCAN, is presented which uses reverse nearest neighbor counts as an estimate of observation density. Clustering is performed using a DBSCAN-like approach based on k nearest neighbor graph traversals through dense observations. RNN-DBSCAN is preferable to the popular density-based clustering algorithm DBSCAN in two aspects. First, problem complexity is reduced to the use of a single parameter (choice of k nearest neighbors), and second, an improved ability for handling large variations in cluster density (heterogeneous density). The superiority of RNN-DBSCAN is demonstrated on several artificial and real-world datasets with respect to prior work on reverse nearest neighbor based clustering approaches (RECORD, IS-DBSCAN, and ISB-DBSCAN) along with DBSCAN and OPTICS. Each of these clustering approaches is described by a common graph-based interpretation wherein clusters of dense observations are defined as connected components, along with a discussion on their computational complexity. Heuristics for RNN-DBSCAN parameter selection are presented, and the effects of k on RNN-DBSCAN clusterings discussed. Additionally, with respect to scalability, an approximate version of RNN-DBSCAN is presented leveraging an existing approximate k nearest neighbor technique.
引用
收藏
页码:1109 / 1121
页数:13
相关论文
共 37 条
[1]  
Ankerst M., 1999, SIGMOD Record, V28, P49, DOI 10.1145/304181.304187
[2]  
[Anonymous], 1990, P 1990 ACM SIGMOD IN, DOI DOI 10.1145/93597.98741
[3]  
[Anonymous], 2011, WWW
[4]   MULTIDIMENSIONAL BINARY SEARCH TREES USED FOR ASSOCIATIVE SEARCHING [J].
BENTLEY, JL .
COMMUNICATIONS OF THE ACM, 1975, 18 (09) :509-517
[5]  
Beyer K, 1999, LECT NOTES COMPUT SC, V1540, P217
[6]  
Beygelzimer A, 2006, P 23 INT C MACH LEAR, P97, DOI DOI 10.1145/1143844.1143857
[7]  
Campello Ricardo J. G. B., 2013, Advances in Knowledge Discovery and Data Mining. 17th Pacific-Asia Conference (PAKDD 2013). Proceedings, P160, DOI 10.1007/978-3-642-37456-2_14
[8]   Density-Based Clustering over an Evolving Data Stream with Noise [J].
Cao, Feng ;
Ester, Martin ;
Qian, Weining ;
Zhou, Aoying .
PROCEEDINGS OF THE SIXTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2006, :328-+
[9]  
Cassisi C., 2011, DBSTRATA SYSTEM DENS
[10]   Enhancing density-based clustering: Parameter reduction and outlier detection [J].
Cassisi, Carmelo ;
Ferro, Alfredo ;
Giugno, Rosalba ;
Pigola, Giuseppe ;
Pulvirenti, Alfredo .
INFORMATION SYSTEMS, 2013, 38 (03) :317-330