VDMR-DBSCAN: Varied Density MapReduce DBSCAN

被引:5
作者
Bhardwaj, Surbhi [1 ]
Dash, Subrat Kumar [1 ]
机构
[1] LNM Inst Informat Technol, Dept Comp Sci & Engn, Jaipur, Rajasthan, India
来源
BIG DATA ANALYTICS, BDA 2015 | 2015年 / 9498卷
关键词
Clustering; DBSCAN; Varied density; MapReduce; Hadoop; Scalable algorithm; ALGORITHM;
D O I
10.1007/978-3-319-27057-9_10
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
DBSCAN is a well-known density based clustering algorithm, which can discover clusters of different shapes and sizes along with out-liers. However, it suffers from major drawbacks like high computational cost, inability to find varied density clusters and dependency on user provided input density parameters. To address these issues, we propose a novel density based clustering algorithm titled, VDMR-DBSCAN (Varied Density MapReduce DBSCAN), a scalable DBSCAN algorithm using MapReduce which can detect varied density clusters with automatic computation of input density parameters. VDMR-DBSCAN divides the data into small partitions which are parallely processed on Hadoop platform. Thereafter, density variations in a partition are analyzed statistically to divide the data into groups of similar density called Density level sets (DLS). Input density parameters are estimated for each DLS, later DBSCAN is applied on each DLS using its corresponding density parameters. Most importantly, we propose a novel merging technique, which merges the similar density clusters present in different partitions and produces meaningful and compact clusters of varied density. We experimented on large and small synthetic datasets which well confirms the efficacy of our algorithm in terms of scalability and ability to find varied density clusters.
引用
收藏
页码:134 / 150
页数:17
相关论文
共 15 条
[1]  
Ankerst M, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P49
[2]   ST-DBSCAN: An algorithm for clustering spatial-temp oral data [J].
Birant, Derya ;
Kut, Alp .
DATA & KNOWLEDGE ENGINEERING, 2007, 60 (01) :208-221
[3]  
Dai B.-R., 2012, 2012 IEEE 5 INT C CL
[4]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[5]  
Emre C.M., 2005, 2005 INT C INF TECHN, V1
[6]  
Ester M., 1996, KDD-96 Proceedings. Second International Conference on Knowledge Discovery and Data Mining, P226
[7]   Multidimensional access methods [J].
Gaede, V ;
Gunther, O .
ACM COMPUTING SURVEYS, 1998, 30 (02) :170-231
[8]  
Hadoop WT, 2009, THE DEFINITIVE GUIDE
[9]  
Han J, 2012, MOR KAUF D, P1
[10]   MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data [J].
He, Yaobin ;
Tan, Haoyu ;
Luo, Wuman ;
Feng, Shengzhong ;
Fan, Jianping .
FRONTIERS OF COMPUTER SCIENCE, 2014, 8 (01) :83-99