Considering Data Skew in Multi-way Joins for MapReduce

被引:0
作者
Wu, Lei [1 ]
Zhang, Changchun [1 ]
Meng, Haiyan [1 ]
Li, Jing [1 ]
机构
[1] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei 230026, Anhui, Peoples R China
来源
2013 8TH CHINAGRID ANNUAL CONFERENCE (CHINAGRID) | 2013年
关键词
MapReduce; Data skew; Multi-way joins;
D O I
10.1109/ChinaGrid.2013.8
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Data analyzing and processing are important tasks in cloud computing. The MapReduce can provide a cost-effective, flexible, fault-tolerant and scalable distributed programming model over large clusters. However, how to implement join operation using MapReduce efficiently is an attractive point. Data skew problem has a strong impact on the performance of join operation. In this paper, we implement the range partition method based on the way of sampling, and apply it to multi-way joins to avoid the influence of data skew. The results of the experiments we have conducted show that our approach is more efficient than current algorithms.
引用
收藏
页码:69 / 73
页数:5
相关论文
共 13 条
[1]   Optimizing Multiway Joins in a Map-Reduce Environment [J].
Afrati, Foto N. ;
Ullman, Jeffrey D. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (09) :1282-1298
[2]  
[Anonymous], 2010, P ACM SIGMOD INT C M, DOI DOI 10.1145/1807167.1807273
[3]  
[Anonymous], SOCC 11
[4]  
Atta Fariha, SAND JOIN SKEW HANDL
[5]   QUERY-PROCESSING IN A SYSTEM FOR DISTRIBUTED DATABASES (SDD-1) [J].
BERNSTEIN, PA ;
GOODMAN, N ;
WONG, E ;
REEVE, CL ;
ROTHNIE, JB .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 1981, 6 (04) :602-625
[6]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[7]   QUERY EVALUATION TECHNIQUES FOR LARGE DATABASES [J].
GRAEFE, G .
COMPUTING SURVEYS, 1993, 25 (02) :73-170
[8]   MAP-JOIN-REDUCE: Toward Scalable and Efficient Data Analysis on Large Clusters [J].
Jiang, Dawei ;
Tung, Anthony K. H. ;
Chen, Gang .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (09) :1299-1311
[9]   JOIN PROCESSING IN RELATIONAL DATABASES [J].
MISHRA, P ;
EICH, MH .
COMPUTING SURVEYS, 1992, 24 (01) :63-113
[10]   MRShare: Sharing Across Multiple Queries in MapReduce [J].
Nykiel, Tomasz ;
Potamias, Michalis ;
Mishra, Chaitanya ;
Kollios, George ;
Koudas, Nick .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2010, 3 (01) :494-505