Dynamic Workload Balancing for Hadoop MapReduce

被引:13
作者
Hou, Xiaofei [1 ]
Kumar, Ashwin T. K. [1 ]
Thomas, Johnson P. [1 ]
Varadharaj, Vijay [2 ]
机构
[1] Oklahoma State Univ, Dept Comp Sci, Stillwater, OK 74078 USA
[2] Macquarie Univ, Dept Comp, Sydney, NSW, Australia
来源
2014 IEEE FOURTH INTERNATIONAL CONFERENCE ON BIG DATA AND CLOUD COMPUTING (BDCLOUD) | 2014年
关键词
Hadoop; MapReduce; Dynamic Workload balancing; OpenFlow;
D O I
10.1109/BDCloud.2014.103
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Hadoop has two components which are HDFS and MapReduce. HDFS is a distributed file system for storing data for users of Hadoop and MapReduce is the framework that executes jobs from users. Hadoop stores user data based on space utilization of datanodes on the cluster rather than the processing capability of the datanodes. Furthermore Hadoop runs in a heterogeneous environment as all datanodes may not be homogeneous. For these reasons, workload imbalances will occur when Hadoop runs resulting in poor performance. In this paper, we propose a dynamic algorithm to balance the workload between different racks on a Hadoop cluster based on information obtained from analyzing the log files of Hadoop. Moving tasks from the busiest rack to another rack improves the performance of Hadoop MapReduce by reducing the running time of jobs. Our simulations indicate that using our algorithm, we can decrease by more than 50% the remaining time of the tasks belonged to a job running on the busiest rack.
引用
收藏
页码:56 / 62
页数:7
相关论文
共 16 条
[1]  
[Anonymous], 2004, P 6 S OP SYST DES I
[2]  
[Anonymous], 2008, OSDI 08
[3]  
Ashwin Kumar T. K., 2014, 2014 IEEE/ACIS 13th International Conference on Computer and Information Science (ICIS), P315, DOI 10.1109/ICIS.2014.6912153
[4]  
Das A., 2013, P ACM HOTCL
[5]  
Gandhi Rohan., 2013, Proceedings of the 2013 USENIX Conference on Annual Technical Conference, USENIX ATC'13, P61
[6]   Stepping Motor Control Systerm Based on dsPIC30f6010A [J].
HuangFu, J-F ;
Lu, Gang ;
Li, S-J ;
Chang, J-J .
2010 THE 3RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION (PACIIA2010), VOL VIII, 2010, :137-140
[7]  
Li Z., INT J PARALLEL PROGR, P1
[8]   OpenFlow: Enabling innovation in campus networks [J].
McKeown, Nick ;
Anderson, Tom ;
Balakrishnan, Hari ;
Parulkar, Guru ;
Peterson, Larry ;
Rexford, Jennifer ;
Shenker, Scott ;
Turner, Jonathan .
ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2008, 38 (02) :69-74
[9]   Hadoop Acceleration in an OpenFlow-based cluster [J].
Narayan, Sandhya ;
Bailey, Stu ;
Daga, Anand .
2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC), 2012, :535-538
[10]  
Paravastu Rohit., Adaptive Load Balancing in MapReduce using Flubber