MR-DBSCAN: An Efficient Parallel Density-based Clustering Algorithm using MapReduce

被引:125
作者
He, Yaobin [1 ,2 ]
Tan, Haoyu [3 ]
Luo, Wuman [3 ]
Mao, Huajian [4 ]
Ma, Di [1 ]
Feng, Shengzhong [1 ]
Fan, Jianping [1 ]
机构
[1] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen, Peoples R China
[2] Grad Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Hong Kong Univ Sci & Technol, Kowloon, Peoples R China
[4] Natl Univ Def Technol, Changhua, Hunan 410073, Taiwan
来源
2011 IEEE 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS) | 2011年
关键词
DBSCAN; MapReduce; parallel system; data mining;
D O I
10.1109/ICPADS.2011.83
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Data clustering is an important data mining technology that plays a crucial role in numerous scientific applications. However, it is challenging due to the size of datasets has been growing rapidly to extra-large scale in the real world. Meanwhile, MapReduce is a desirable parallel programming platform that is widely applied in kinds of data process fields. In this paper, we propose an efficient parallel density-based clustering algorithm and implement it by a 4-stages MapReduce paradigm. Furthermore, we adopt a quick partitioning strategy for large scale non-indexed data. We study the metric of merge among bordering partitions and make optimizations on it. At last, we evaluate our work on real large scale datasets using Hadoop platform. Results reveal that the speedup and scaleup of our work are very efficient.
引用
收藏
页码:473 / 480
页数:8
相关论文
共 16 条
[1]  
Ankerst M., 1999, SIGMOD Record, V28, P49, DOI 10.1145/304181.304187
[2]  
Arlia D., 2001, Euro-Par 2001 Parallel Processing. 7th International Euro-Par Conference. Proceedings (Lecture Notes in Computer Science Vol.2150), P326
[3]  
BECKMANN N, 1990, SIGMOD REC, V19, P322, DOI 10.1145/93605.98741
[4]   MULTIDIMENSIONAL BINARY SEARCH TREES USED FOR ASSOCIATIVE SEARCHING [J].
BENTLEY, JL .
COMMUNICATIONS OF THE ACM, 1975, 18 (09) :509-517
[5]  
BERGER MJ, 1987, IEEE T COMPUT, V36, P570, DOI 10.1109/TC.1987.1676942
[6]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[7]  
Ester M., 1996, DENSITY BASED ALGORI, DOI DOI 10.5555/3001460.3001507
[8]  
Guttman A., 1984, SIGMOD Record, V14, P47, DOI 10.1145/971697.602266
[9]  
Januzaj E, 2004, LECT NOTES ARTIF INT, V3202, P231
[10]  
Kwon Y, 2010, LECT NOTES COMPUT SC, V6187, P132, DOI 10.1007/978-3-642-13818-8_11