iShuffle: Improving Hadoop Performance with Shuffle-on-Write

被引:48
作者
Guo, Yanfei [1 ]
Rao, Jia [1 ]
Cheng, Dazhao [1 ]
Zhou, Xiaobo [1 ]
机构
[1] Univ Colorado, Dept Comp Sci, Colorado Springs, CO 80918 USA
基金
美国国家科学基金会;
关键词
MapReduce; shuffle; dataskew; task scheduling;
D O I
10.1109/TPDS.2016.2587645
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Hadoop is a popular implementation of the MapReduce framework for running data-intensive jobs on clusters of commodity servers. Shuffle, the all-to-all input data fetching phase between the map and reduce phase can significantly affect job performance. However, the shuffle phase and reduce phase are coupled together in Hadoop and the shuffle can only be performed by running the reduce tasks. This leaves the potential parallelism between multiple waves of map and reduce unexploited and resource wastage in multi-tenant Hadoop clusters, which significantly delays the completion of jobs in a multi-tenant Hadoop cluster. More importantly, Hadoop lacks the ability to schedule task efficiently and mitigate the data distribution skew among reduce tasks, which leads to further degradation of job performance. In this work, we propose to decouple shuffle from reduce tasks and convert it into a platform service provided by Hadoop. We present iShuffle, a user-transparent shuffle service that pro-actively pushes map output data to nodes via a novel shuffle-on-write operation and flexibly schedules reduce tasks considering workload balance. Experimental results with representative workloads and Facebook workload trace show that iShuffle reduces job completion time by as much as 29.6 and 34 percent in single-user and multi-user clusters, respectively.
引用
收藏
页码:1649 / 1662
页数:14
相关论文
共 32 条
[1]   MapReduce with communication overlap (MaRCO) [J].
Ahmad, Faraz ;
Lee, Seyong ;
Thottethodi, Mithuna ;
Vijaykumar, T. N. .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2013, 73 (05) :608-620
[2]  
Ahmad Faraz., 2014, 2014 USENIX ANN TECH, P1
[3]  
Ahmad Faraz., 2012, P 17 INT C ARCHITECT, P61
[4]  
Ananthanarayanan G, 2011, EUROSYS 11: PROCEEDINGS OF THE EUROSYS 2011 CONFERENCE, P287
[5]  
[Anonymous], 2004, OSDI 04
[6]  
[Anonymous], 1990, COMPUT INTRACTABILIT
[7]  
[Anonymous], 2015, MANUAL ELASTICSEARCH
[8]  
[Anonymous], 1996, THESIS
[9]  
[Anonymous], 2008, 8 USENIX S OP SYST D
[10]  
[Anonymous], 2011, P USENIX C NETW SYST