Performance Variations in Resource Scaling for MapReduce Applications on Private and Public Clouds

被引:1
作者
Zhang, Fan [1 ]
Sakr, Majd [2 ]
机构
[1] MIT, 185 Albany St, Cambridge, MA 02139 USA
[2] Carnegie Mellon Univ, 5000 Forbes Ave, Pittsburgh, PA 15213 USA
来源
2014 IEEE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD) | 2014年
关键词
Cloud computing; MapReduce applications; dataset size; input scaling; parallel computing;
D O I
10.1109/CLOUD.2014.68
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we delineate the causes of performance variations when scaling provisioned virtual resources for a variety of MapReduce applications. Hadoop MapReduce facilitates the development and execution processes of large-scale batch applications on big data. However, provisioning suitable resources to achieve desired performance at an affordable cost requires expertise into the execution model of MapReduce, the resources available for provisioning and the execution behavior of the application at hand. As an initial step towards automating this process, we characterize the difference in execution response for different MapReduce applications while varying the number of virtualized CPUs and memory resources, number of map slots as well as cluster size on a private cloud. This characterization helps illustrate the performance variation, 5x compared to 36x speedup, of Reduce-intensive and Map-intensive applications at effectively utilizing provisioned resources at different scales (1-64 VMs). By comparing the scalability efficiency, we clearly indicate the under-provisioning or over-provisioning of resources for different MapReduce applications at large scale.
引用
收藏
页码:457 / 466
页数:10
相关论文
共 22 条
[1]  
Abad C. L., 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC 2012), P100, DOI 10.1109/IISWC.2012.6402909
[2]  
Ahmad Faraz, PUMA PURDUE MAPREDUC
[3]  
[Anonymous], P VER LARG DAT C VLD
[4]  
[Anonymous], 2009, Hadoop: The Definitive Guide
[5]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[6]  
Ekanayake J., P 2008 4 IEEE INT C, P277
[7]  
Fadika Z., 2011, GRID COMPUTING IEEEA, P90
[8]  
Guo Z., 2012, 12 IEEE ACM INT S CL
[9]  
Hammoud M., 2012, TECHNICAL REPORT
[10]   Cross-Phase Optimization in MapReduce [J].
Heintz, Benjamin ;
Wang, Chenyu ;
Chandra, Abhishek ;
Weissman, Jon .
PROCEEDINGS OF THE 2013 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING (IC2E 2013), 2013, :338-347