An Improved DBSCAN Algorithm Based on the Neighbor Similarity and Fast Nearest Neighbor Query

被引:66
作者
Li, Shan-Shan [1 ,2 ,3 ]
机构
[1] Huaqiao Univ, Dept IT Dev & Management, Xiamen 361021, Peoples R China
[2] Huaqiao Univ, Informat Construct & Management Off, Xiamen 361021, Peoples R China
[3] Soochow Univ, Prov Key Lab Comp Informat Proc Technol, Suzhou 215006, Peoples R China
基金
中国国家自然科学基金;
关键词
Clustering; DBSCAN; neighbor similarity; cover tree; SEARCH;
D O I
10.1109/ACCESS.2020.2972034
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
DBSCAN is the most famous density based clustering algorithm which is one of the main clustering paradigms. However, there are many redundant distance computations among the process of DBSCAN clustering, due to brute force Range-Query used to retrieve neighbors for each point in DBSCAN, which yields high complexity (O(n(2))) and low efficiency. Thus, it is unsuitable and not applicable for large scale data. In this paper, an improved DBSCAN based on neighbor similarity is proposed, which utilizes and Cover Tree to retrieve neighbors for each point in parallel, and the triangle inequality to filter many unnecessary distance computations. From the experiments conducted on large scale data sets, it is demonstrated that the proposed algorithm greatly speedup the original DBSCAN, and outperform the main improvements of DBSCAN. Comparing with rho-approximate DBSCAN, which is the current fastest but approximate version of DBSCAN, the proposed algorithm has two advantages: one is faster and the other is that the result is accurate.
引用
收藏
页码:47468 / 47476
页数:9
相关论文
共 30 条
[1]  
Ankerst M, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P49
[2]  
[Anonymous], [No title captured]
[3]  
[Anonymous], 2004, SIGMOD, DOI DOI 10.1145/1007568.1007620
[4]  
[Anonymous], 2013, THESIS TU EINDHOVEN
[5]  
Beygelzimer A., 2006, ICML
[6]   SPARCL: Efficient and Effective Shape-based Clustering [J].
Chaoji, Vineet ;
Al Hasan, Mohammad ;
Salem, Saeed ;
Zaki, Mohammed J. .
ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, :93-102
[7]   KNN-BLOCK DBSCAN: Fast Clustering for Large-Scale Data [J].
Chen, Yewang ;
Zhou, Lida ;
Pei, Songwen ;
Yu, Zhiwen ;
Chen, Yi ;
Liu, Xin ;
Du, Jixiang ;
Xiong, Naixue .
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2021, 51 (06) :3939-3953
[8]   Fast density peak clustering for large scale data based on kNN [J].
Chen, Yewang ;
Hu, Xiaoliang ;
Fan, Wentao ;
Shen, Lianlian ;
Zhang, Zheng ;
Liu, Xin ;
Du, Jixiang ;
Li, Haibo ;
Chen, Yi ;
Li, Hailin .
KNOWLEDGE-BASED SYSTEMS, 2020, 187
[9]   Semi-Convex Hull Tree: Fast Nearest Neighbor Queries for Large Scale Data on GPUs [J].
Chen, Yewang ;
Zhou, Lida ;
Bouguila, Nizar ;
Zhong, Bineng ;
Wu, Fei ;
Lei, Zhen ;
Du, Jixiang ;
Li, Hailin .
2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2018, :911-916
[10]   Fast neighbor search by using revised k-d tree [J].
Chen, Yewang ;
Zhou, Lida ;
Tang, Yi ;
Singh, Jai Puneet ;
Bouguila, Nizar ;
Wang, Cheng ;
Wang, Huazhen ;
Du, Jixiang .
INFORMATION SCIENCES, 2019, 472 :145-162