Towards efficient resource provisioning in MapReduce

被引:32
作者
Nghiem, Peter P. [1 ]
Figueira, Silvia M. [1 ]
机构
[1] Santa Clara Univ, Dept Comp Engn, 500 El Camino Real, Santa Clara, CA 95053 USA
关键词
Hadoop MapReduce; Spark; Optimal resource provisioning; Energy efficiency; Runtime elbow curve; YARN;
D O I
10.1016/j.jpdc.2016.04.001
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The paper presents a novel approach and algorithm with mathematical formula for obtaining the exact optimal number of task resources for any workload running on Hadoop MapReduce. In the era of Big Data, energy efficiency has become an important issue for the ubiquitous Hadoop MapReduce framework. However, the question of what is the optimal number of tasks required for a job to get the most efficient performance from MapReduce still has no definite answer. Our algorithm for optimal resource provisioning allows users to identify the best trade-off point between performance and energy efficiency on the runtime elbow curve fitted from sampled executions on the target cluster for subsequent behavioral replication. Our verification and comparison show that the currently well-known rules of thumb for calculating the required number of reduce tasks are inaccurate and could lead to significant waste of computing resources and energy with no further improvement in execution time. (C) 2016 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license.
引用
收藏
页码:29 / 41
页数:13
相关论文
共 22 条
[1]  
[Anonymous], 2015, P 12 ACM INT C COMP
[2]  
[Anonymous], 2009, UCBEECS2009109
[3]  
[Anonymous], 2008, HotPower
[4]  
Babu S., 2010, P 1 ACM S CLOUD COMP
[5]  
Chen Y., 2010, UCBEECS2010135
[6]  
Hartog J., 2012, 2012 IEEE 5th International Conference on Cloud Computing (CLOUD), P914, DOI 10.1109/CLOUD.2012.137
[7]  
Herodotou Herodotos, 2011, P 2 ACM S CLOUD COMP
[8]  
Hortonworks Data Platform, 2015, SECT 1 11 1 MAN CALC
[9]  
Kambatla K., 2009, P 1 WORKSH HOT TOP C
[10]  
Karanth S., 2014, MASTERING HADOOP ADV, P50