Orchestrating an Ensemble of MapReduce Jobs for Minimizing Their Makespan

被引:49
|
作者
Verma, Abhishek [1 ]
Cherkasova, Ludmila [2 ]
Campbell, Roy H. [1 ]
机构
[1] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
[2] Hewlett Packard Labs, Palo Alto, CA 94304 USA
关键词
MapReduce; Hadoop; batch workloads; optimized schedule; minimized makespan; SYSTEMS;
D O I
10.1109/TDSC.2013.14
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Cloud computing offers an attractive option for businesses to rent a suitable size MapReduce cluster, consume resources as a service, and pay only for resources that were consumed. A key challenge in such environments is to increase the utilization of MapReduce clusters to minimize their cost. One way of achieving this goal is to optimize the execution of Mapreduce jobs on the cluster. For a set of production jobs that are executed periodically on new data, we can perform an offline analysis for evaluating performance benefits of different optimization techniques. In this work, we consider a subset of production workloads that consists of MapReduce jobs with no dependencies. We observe that the order in which these jobs are executed can have a significant impact on their overall completion time and the cluster resource utilization. Our goal is to automate the design of a job schedule that minimizes the completion time (makespan) of such a set of MapReduce jobs. We introduce a simple abstraction where each MapReduce job is represented as a pair of map and reduce stage durations. This representation enables us to apply the classic Johnson algorithm that was designed for building an optimal two-stage job schedule. We evaluate the performance benefits of the constructed schedule through an extensive set of simulations over a variety of realistic workloads. The results are workload and cluster-size dependent, but it is typical to achieve up to 10-25 percent of makespan improvements by simply processing the jobs in the right order. However, in some cases, the simplified abstraction assumed by Johnson's algorithm may lead to a suboptimal job schedule. We design a novel heuristic, called BalancedPools, that significantly improves Johnson's schedule results (up to 15-38 percent), exactly in the situations when it produces suboptimal makespan. Overall, we observe up to 50 percent in the makespan improvements with the new BalancedPools algorithm. The results of our simulation study are validated through experiments on a 66-node Hadoop cluster.
引用
收藏
页码:314 / 327
页数:14
相关论文
共 50 条
  • [1] A Note on "Orchestrating an Ensemble of MapReduce Jobs for Minimizing Their Makespan"
    Tian, Wenhong
    Chen, Yu
    Wang, Xinyang
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2014, 11 (04) : 390 - 391
  • [2] On Minimizing the Makespan Of A Set of Offline MapReduce Jobs
    He, Majun
    Guo, Wenxia
    Huang, Houwen
    He, Bo
    Wang, Jing
    Tian, Wenhong
    2018 IEEE WORLD CONGRESS ON SERVICES (IEEE SERVICES 2018), 2018, : 1 - 2
  • [3] HScheduler: an optimal approach to minimize the makespan of multiple MapReduce jobs
    Tian, Wenhong
    Li, Guozhong
    Yang, Wutong
    Buyya, Rajkumar
    JOURNAL OF SUPERCOMPUTING, 2016, 72 (06): : 2376 - 2393
  • [4] HScheduler: an optimal approach to minimize the makespan of multiple MapReduce jobs
    Wenhong Tian
    Guozhong Li
    Wutong Yang
    Rajkumar Buyya
    The Journal of Supercomputing, 2016, 72 : 2376 - 2393
  • [5] A Heuristic Fault Tolerant MapReduce Framework for Minimizing Makespan in Hybrid Cloud Environment
    Raju, R.
    Amudhavel, J.
    Pavithra, M.
    Anuja, S.
    Abinaya, B.
    2014 INTERNATIONAL CONFERENCE ON GREEN COMPUTING COMMUNICATION AND ELECTRICAL ENGINEERING (ICGCCEE), 2014,
  • [6] A Task-Based Greedy Scheduling Algorithm for Minimizing Energy of MapReduce Jobs
    Mostafa Hadadian Nejad Yousefi
    Maziar Goudarzi
    Journal of Grid Computing, 2018, 16 : 535 - 551
  • [7] A Task-Based Greedy Scheduling Algorithm for Minimizing Energy of MapReduce Jobs
    Yousefi, Mostafa Hadadian Nejad
    Goudarzi, Maziar
    JOURNAL OF GRID COMPUTING, 2018, 16 (04) : 535 - 551
  • [8] Sharing across Multiple MapReduce Jobs
    Nykiel, Tomasz
    Potamias, Michalis
    Mishra, Chaitanya
    Kollios, George
    Koudas, Nick
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2014, 39 (02):
  • [9] Analysis and Improvement of Makespan and Utilization for MapReduce
    Li, Yin
    Lin, Chuang
    Ren, Fengyuan
    2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 434 - 441
  • [10] Marimba: A Framework for Making MapReduce Jobs Incremental
    Schildgen, Johannes
    Joerg, Thomas
    Hoffmann, Manuel
    Dessloch, Stefan
    2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS), 2014, : 128 - 135