Performance Modeling of MapReduce Jobs in Heterogeneous Cloud Environments

被引:45
|
作者
Zhang, Zhuoyao [1 ]
Cherkasova, Ludmila [2 ]
Boon Thau Loo [1 ]
机构
[1] Univ Penn, Philadelphia, PA 19104 USA
[2] Hewlett Packard Labs, Palo Alto, CA USA
来源
2013 IEEE SIXTH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD 2013) | 2013年
关键词
MapReduce; heterogeneous clusters; performance modeling; efficiency;
D O I
10.1109/CLOUD.2013.107
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Many companies start using Hadoop for advanced data analytics over large datasets. While a traditional Hadoop cluster deployment assumes a homogeneous cluster, many enterprise clusters are grown incrementally over time, and might have a variety of different servers in the cluster. The nodes' heterogeneity represents an additional challenge for efficient cluster and job management. Due to resource heterogeneity, it is often unclear which resources introduce inefficiency and bottlenecks, and how such a Hadoop cluster should be configured and optimized. In this work(1), we explore the efficiency and performance accuracy of the bounds-based performance model for predicting the MapReduce job completion times in heterogeneous Hadoop clusters. We validate the accuracy of the proposed performance model using a diverse set of 13 realistic applications and two different heterogeneous clusters. Since one of the Hadoop clusters is formed by different capacity VM instances in Amazon EC2 environment, we additionally explore and discuss factors that impact the MapReduce job performance in the Cloud.
引用
收藏
页码:839 / 846
页数:8
相关论文
共 50 条
  • [41] PERFORMANCE EVALUATION OF MAPREDUCE USING FULL VIRTUALISATION ON A DEPARTMENTAL CLOUD
    Gonzalez-Velez, Horacio
    Kontagora, Maryam
    INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2011, 21 (02) : 275 - 284
  • [42] Improving Performance of Heterogeneous MapReduce Clusters with Adaptive Task Tuning
    Cheng, Dazhao
    Rao, Jia
    Guo, Yanfei
    Jiang, Changjun
    Zhou, Xiaobo
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (03) : 774 - 786
  • [43] P2P-MapReduce: Parallel data processing in dynamic Cloud environments
    Marozzo, Fabrizio
    Talia, Domenico
    Trunfio, Paolo
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2012, 78 (05) : 1382 - 1402
  • [44] QoS-guaranteed resource provisioning for cloud-based MapReduce in dynamical environments
    Xu, Xiaoyong
    Tang, Maolin
    Tian, Yu-Chu
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 78 : 18 - 30
  • [45] Malleable scheduling for flows of jobs and applications to MapReduce
    Viswanath Nagarajan
    Joel Wolf
    Andrey Balmin
    Kirsten Hildrum
    Journal of Scheduling, 2019, 22 : 393 - 411
  • [46] Scheduling MapReduce Jobs on Identical and Unrelated Processors
    Fotakis, Dimitris
    Milis, Ioannis
    Papadigenopoulos, Orestis
    Vassalos, Vasilis
    Zois, Georgios
    THEORY OF COMPUTING SYSTEMS, 2020, 64 (05) : 754 - 782
  • [47] Optimizing MapReduce Task Scheduling on Virtualized Heterogeneous Environments Using Ant Colony Optimization
    Jeyaraj, Rathinaraja
    Paul, Anand
    IEEE ACCESS, 2022, 10 : 55842 - 55855
  • [48] Analysis of hadoop MapReduce scheduling in heterogeneous environment
    Kalia, Khushboo
    Gupta, Neeraj
    AIN SHAMS ENGINEERING JOURNAL, 2021, 12 (01) : 1101 - 1110
  • [49] Malleable scheduling for flows of jobs and applications to MapReduce
    Nagarajan, Viswanath
    Wolf, Joel
    Balmin, Andrey
    Hildrum, Kirsten
    JOURNAL OF SCHEDULING, 2019, 22 (04) : 393 - 411
  • [50] Marimba: A Framework for Making MapReduce Jobs Incremental
    Schildgen, Johannes
    Joerg, Thomas
    Hoffmann, Manuel
    Dessloch, Stefan
    2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS), 2014, : 128 - 135