Performance Modeling of MapReduce Jobs in Heterogeneous Cloud Environments

被引:45
|
作者
Zhang, Zhuoyao [1 ]
Cherkasova, Ludmila [2 ]
Boon Thau Loo [1 ]
机构
[1] Univ Penn, Philadelphia, PA 19104 USA
[2] Hewlett Packard Labs, Palo Alto, CA USA
来源
2013 IEEE SIXTH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD 2013) | 2013年
关键词
MapReduce; heterogeneous clusters; performance modeling; efficiency;
D O I
10.1109/CLOUD.2013.107
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Many companies start using Hadoop for advanced data analytics over large datasets. While a traditional Hadoop cluster deployment assumes a homogeneous cluster, many enterprise clusters are grown incrementally over time, and might have a variety of different servers in the cluster. The nodes' heterogeneity represents an additional challenge for efficient cluster and job management. Due to resource heterogeneity, it is often unclear which resources introduce inefficiency and bottlenecks, and how such a Hadoop cluster should be configured and optimized. In this work(1), we explore the efficiency and performance accuracy of the bounds-based performance model for predicting the MapReduce job completion times in heterogeneous Hadoop clusters. We validate the accuracy of the proposed performance model using a diverse set of 13 realistic applications and two different heterogeneous clusters. Since one of the Hadoop clusters is formed by different capacity VM instances in Amazon EC2 environment, we additionally explore and discuss factors that impact the MapReduce job performance in the Cloud.
引用
收藏
页码:839 / 846
页数:8
相关论文
共 50 条
  • [21] An optimization framework for the capacity allocation and admission control of MapReduce jobs in cloud systems
    Malekimajd, M.
    Ardagna, D.
    Ciavotta, M.
    Gianniti, E.
    Passacantando, M.
    Rizzi, A. M.
    JOURNAL OF SUPERCOMPUTING, 2018, 74 (10): : 5314 - 5348
  • [22] TMaR: a two-stage MapReduce scheduler for heterogeneous environments
    Maleki, Neda
    Faragardi, Hamid Reza
    Rahmani, Amir Masoud
    Conti, Mauro
    Lofstead, Jay
    HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES, 2020, 10 (01)
  • [23] MRTune: A Simulator for Performance Tuning of MapReduce Jobs with Skewed Data
    Zhou, Xibo
    Luo, Wuman
    Tan, Haoyu
    2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2014, : 352 - 359
  • [24] Joint scheduling of MapReduce jobs with servers: Performance bounds and experiments
    Ling, Xiao
    Yuan, Yi
    Wang, Dan
    Liu, Jiangchuan
    Yang, Jiahai
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2016, 90-91 : 52 - 66
  • [25] Performance Prediction Model in Heterogeneous MapReduce Environment
    Fan, Yuanquan
    Wu, Weiguo
    Xu, Yunlong
    Cao, Yangjie
    Li, Qian
    Cui, Jinhua
    Duan, Zhangfeng
    2014 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (CIT), 2014, : 240 - 245
  • [26] PERFORMANCE MODELING AND OPTIMIZATION OF MAPREDUCE PROGRAMS
    Yin, Jinsong
    Qiao, Yuanyuan
    2014 IEEE 3RD INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2014, : 180 - 186
  • [27] MapReduce optimization algorithm based on machine learning in heterogeneous cloud environment
    LIN Wen-hui
    LEI Zhen-ming
    LIU Jun
    YANG Jie
    LIU Fang
    HE Gang
    WANG Qin
    The Journal of China Universities of Posts and Telecommunications, 2013, (06) : 77 - 87
  • [28] A New Approach to the Cloud-Based Heterogeneous MapReduce Placement Problem
    Xu, Xiaoyong
    Tang, Maolin
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2016, 9 (06) : 862 - 871
  • [29] A Scheduling Algorithm for Hadoop MapReduce Workflows with Budget Constraints in the Heterogeneous Cloud
    Wylie, Andrew
    Shi, Wei
    Corriveau, Jean-Pierre
    Wang, Yang
    2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 1433 - 1442
  • [30] MRA plus plus : Scheduling and data placement on MapReduce for heterogeneous environments
    Anjos, Julio C. S.
    Carrera, Ivan
    Kolberg, Wagner
    Tibola, Andre Luis
    Arantes, Luciana B.
    Geyer, Claudio R.
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2015, 42 : 22 - 35