SmartJoin: a network-aware multiway join for MapReduce

被引:0
作者
Kenn Slagter
Ching-Hsien Hsu
Yeh-Ching Chung
Gangman Yi
机构
[1] National Tsing Hua University,Department of Computer Science
[2] Chung Hua University,Department of Computer Science
[3] Gangneung-Wonju National University,Department of Computer Science
来源
Cluster Computing | 2014年 / 17卷
关键词
MapReduce; Hadoop; Multiway join; Workload redistribution;
D O I
暂无
中图分类号
学科分类号
摘要
MapReduce is an effective tool for processing large amounts of data in parallel using a cluster of processors or computers. One common data processing task is the join operation, which combines two or more datasets based on values common to each. In this paper, we present a network aware multi-way join for MapReduce (SmartJoin) that improves performance and considers network traffic when redistributing workload amongst reducers. SmartJoin achieves this by dynamically redistributing tuples directly between reducers with an intelligent network aware algorithm. We show that our presented technique has significant potential to minimize the time required to join multiple datasets. In our evaluation, we show that SmartJoin has up to 39 % improvement compared to the non-redistribution method, a 26.8 % improvement over random redistribution and 27.6 % improvement over worst join redistribution.
引用
收藏
页码:629 / 641
页数:12
相关论文
共 31 条
  • [1] Dean J(2008)MapReduce: simplified data processing on large clusters Commun. ACM 51 107-113
  • [2] Ghemawat S(2009)Towards efficient MapReduce using MPI Lecture Notes Comput. Sci. 5759 240-249
  • [3] Hoefler T(2012)Processing and analysing large log data files of a virtual campus J. Converg. 3 1-8
  • [4] Lumsdaine A(2013)Intelligent environments: a manifesto Human-centric Comput. Inf. Sci. 3 1-18
  • [5] Dongarra J(2013)Mining consumer attitude and behavior, an exploratory study on movie audience attitude extracted from twitter J. Converg. 4 29-35
  • [6] Xhafa F(2011)Optimizing multiway joins in a map-reduce environment IEEE Knowl. Data Eng. 23 1282-1298
  • [7] Augusto J(2011)Map-join-reduce: Toward scalable and efficient data analysis on large clusters Knowl. Data Eng. IEEE Trans. 23 1299-1311
  • [8] Callaghan V(2012)V-smart-join: a scalable mapreduce framework for all-pair similarity joins of multisets and vectors Proc. VLDB Endow. 5 704-715
  • [9] Cook D(2010)MapReduce and parallel DBMSs: friends or foes? Commun. ACM 53 64-71
  • [10] Kameas A(undefined)undefined undefined undefined undefined-undefined