Orchestrating an Ensemble of MapReduce Jobs for Minimizing Their Makespan

被引:49
|
作者
Verma, Abhishek [1 ]
Cherkasova, Ludmila [2 ]
Campbell, Roy H. [1 ]
机构
[1] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
[2] Hewlett Packard Labs, Palo Alto, CA 94304 USA
关键词
MapReduce; Hadoop; batch workloads; optimized schedule; minimized makespan; SYSTEMS;
D O I
10.1109/TDSC.2013.14
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Cloud computing offers an attractive option for businesses to rent a suitable size MapReduce cluster, consume resources as a service, and pay only for resources that were consumed. A key challenge in such environments is to increase the utilization of MapReduce clusters to minimize their cost. One way of achieving this goal is to optimize the execution of Mapreduce jobs on the cluster. For a set of production jobs that are executed periodically on new data, we can perform an offline analysis for evaluating performance benefits of different optimization techniques. In this work, we consider a subset of production workloads that consists of MapReduce jobs with no dependencies. We observe that the order in which these jobs are executed can have a significant impact on their overall completion time and the cluster resource utilization. Our goal is to automate the design of a job schedule that minimizes the completion time (makespan) of such a set of MapReduce jobs. We introduce a simple abstraction where each MapReduce job is represented as a pair of map and reduce stage durations. This representation enables us to apply the classic Johnson algorithm that was designed for building an optimal two-stage job schedule. We evaluate the performance benefits of the constructed schedule through an extensive set of simulations over a variety of realistic workloads. The results are workload and cluster-size dependent, but it is typical to achieve up to 10-25 percent of makespan improvements by simply processing the jobs in the right order. However, in some cases, the simplified abstraction assumed by Johnson's algorithm may lead to a suboptimal job schedule. We design a novel heuristic, called BalancedPools, that significantly improves Johnson's schedule results (up to 15-38 percent), exactly in the situations when it produces suboptimal makespan. Overall, we observe up to 50 percent in the makespan improvements with the new BalancedPools algorithm. The results of our simulation study are validated through experiments on a 66-node Hadoop cluster.
引用
收藏
页码:314 / 327
页数:14
相关论文
共 50 条
  • [31] Performance Modeling of MapReduce Jobs in Heterogeneous Cloud Environments
    Zhang, Zhuoyao
    Cherkasova, Ludmila
    Boon Thau Loo
    2013 IEEE SIXTH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD 2013), 2013, : 839 - 846
  • [32] Minimizing Data Transmission Latency by Bipartite Graph in MapReduce
    Wei, Jie
    Wang, Shangguang
    Zhang, Lingyan
    Zhou, Ao
    Sun, Qibo
    Shi, Ruisheng
    Yang, Fangchun
    2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015, 2015, : 521 - 522
  • [33] Online makespan minimization in MapReduce-like systems with complex reduce tasks
    Taibo Luo
    Yuqing Zhu
    Weili Wu
    Yinfeng Xu
    Ding-Zhu Du
    Optimization Letters, 2017, 11 : 271 - 277
  • [34] MapReduce service provisioning for frequent big data jobs on clouds considering data transfers
    Nabavinejad, Seyed Morteza
    Goudarzi, Maziar
    Abedi, Saeed
    COMPUTERS & ELECTRICAL ENGINEERING, 2018, 71 : 594 - 610
  • [35] Online makespan minimization in MapReduce-like systems with unsplit map tasks
    Pan, Jiayin
    Xu, Yinfeng
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING AND SYSTEMS MANAGEMENT (IESM 2019), 2019, : 470 - 475
  • [36] Online makespan minimization in MapReduce-like systems with complex reduce tasks
    Luo, Taibo
    Zhu, Yuqing
    Wu, Weili
    Xu, Yinfeng
    Du, Ding-Zhu
    OPTIMIZATION LETTERS, 2017, 11 (02) : 271 - 277
  • [37] Improving Energy Efficiency of IO-Intensive MapReduce Jobs
    Tiwari, Nidhi
    Sarkar, Santonu
    Indrawan-Santiago, Maria
    Bellur, Umesh
    PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING AND NETWORKING, 2015,
  • [38] Impacts of Task Re-Execution Policy on MapReduce Jobs
    Lin, Jia-Chun
    Leu, Fang-Yie
    Chen, Ying-ping
    COMPUTER JOURNAL, 2016, 59 (05): : 701 - 714
  • [39] FARMS: Efficient mapreduce speculation for failure recovery in short jobs
    Fu, Huansong
    Chen, Haiquan
    Zhu, Yue
    Yu, Weikuan
    PARALLEL COMPUTING, 2017, 61 : 68 - 82
  • [40] Efficient Distribution of MapReduce Jobs for Maximizing Profit on Federated Cloud
    Gouasmi, Thouraya
    Louati, Wajdi
    Kacem, Ahmed Hadj
    33RD ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, 2018, : 207 - 209