CRESP: Towards Optimal Resource Provisioning for MapReduce Computing in Public Clouds

被引:62
作者
Chen, Keke [1 ]
Powers, James [1 ]
Guo, Shumin [1 ]
Tian, Fengguang [1 ]
机构
[1] Wright State Univ, Dept Comp Sci & Engn, Ohio Ctr Excellence Knowledge Enabled Comp Kno E, Dayton, OH 45435 USA
基金
美国国家科学基金会;
关键词
MapReduce; cloud computing; resource provisioning; performance modeling;
D O I
10.1109/TPDS.2013.297
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Running MapReduce programs in the cloud introduces this unique problem: how to optimize resource provisioning to minimize the monetary cost or job finish time for a specific job? We study the whole process of MapReduce processing and build up a cost function that explicitly models the relationship among the time cost, the amount of input data, the available system resources (Map and Reduce slots), and the complexity of the Reduce function for the target MapReduce job. The model parameters can be learned from test runs. Based on this cost function, we can solve a number of decision problems, such as the optimal amount of resources that can minimize monetary cost within a job finish deadline, minimize time cost under a certain monetary budget, or find the optimal tradeoffs between time and monetary costs. Experimental results show that the proposed approach performs well on a number of sample MapReduce programs in both the in-house cluster and Amazon EC2. We also conducted a variance analysis on different components of the MapReduce workflow to show the possible sources of modeling error. Our optimization results show that with the proposed approach we can save a significant amount of time and money, compared to randomly selected settings.
引用
收藏
页码:1403 / 1412
页数:10
相关论文
共 13 条
[1]  
Agarwal Sameer., 2012, P 9 USENIX C NETWORK, P21
[2]  
[Anonymous], 2009, Hadoop: The Definitive Guide
[3]  
Babu S., 2010, P 1 ACM S CLOUD COMP, P137, DOI DOI 10.1145/1807128.1807150
[4]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[5]  
Hastie T., 2001, ELEMENTS STAT LEARNI
[6]  
Herodotou H, 2011, PROC VLDB ENDOW, V4, P1111
[7]   The Performance of MapReduce: An In-depth Study [J].
Jiang, Dawei ;
Ooi, Beng Chin ;
Shi, Lei ;
Wu, Sai .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2010, 3 (01) :472-483
[8]  
Kambatla Karthik., 2009, Proceedings of the 2009 conference on Hot topics in cloud computing, P22
[9]  
Kwon Y., 2012, SIGMOD 12, P25
[10]  
Neter J., 1990, Applied linear statistical models