Orchestrating an Ensemble of MapReduce Jobs for Minimizing Their Makespan

被引:49
|
作者
Verma, Abhishek [1 ]
Cherkasova, Ludmila [2 ]
Campbell, Roy H. [1 ]
机构
[1] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
[2] Hewlett Packard Labs, Palo Alto, CA 94304 USA
关键词
MapReduce; Hadoop; batch workloads; optimized schedule; minimized makespan; SYSTEMS;
D O I
10.1109/TDSC.2013.14
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Cloud computing offers an attractive option for businesses to rent a suitable size MapReduce cluster, consume resources as a service, and pay only for resources that were consumed. A key challenge in such environments is to increase the utilization of MapReduce clusters to minimize their cost. One way of achieving this goal is to optimize the execution of Mapreduce jobs on the cluster. For a set of production jobs that are executed periodically on new data, we can perform an offline analysis for evaluating performance benefits of different optimization techniques. In this work, we consider a subset of production workloads that consists of MapReduce jobs with no dependencies. We observe that the order in which these jobs are executed can have a significant impact on their overall completion time and the cluster resource utilization. Our goal is to automate the design of a job schedule that minimizes the completion time (makespan) of such a set of MapReduce jobs. We introduce a simple abstraction where each MapReduce job is represented as a pair of map and reduce stage durations. This representation enables us to apply the classic Johnson algorithm that was designed for building an optimal two-stage job schedule. We evaluate the performance benefits of the constructed schedule through an extensive set of simulations over a variety of realistic workloads. The results are workload and cluster-size dependent, but it is typical to achieve up to 10-25 percent of makespan improvements by simply processing the jobs in the right order. However, in some cases, the simplified abstraction assumed by Johnson's algorithm may lead to a suboptimal job schedule. We design a novel heuristic, called BalancedPools, that significantly improves Johnson's schedule results (up to 15-38 percent), exactly in the situations when it produces suboptimal makespan. Overall, we observe up to 50 percent in the makespan improvements with the new BalancedPools algorithm. The results of our simulation study are validated through experiments on a 66-node Hadoop cluster.
引用
收藏
页码:314 / 327
页数:14
相关论文
共 50 条
  • [21] Energy-aware Scheduling of MapReduce Jobs
    Mashayekhy, Lena
    Nejad, Mahyar Movahed
    Grosu, Daniel
    Lu, Dajun
    Shi, Weisong
    2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS), 2014, : 32 - 39
  • [22] Malleable scheduling for flows of jobs and applications to MapReduce
    Nagarajan, Viswanath
    Wolf, Joel
    Balmin, Andrey
    Hildrum, Kirsten
    JOURNAL OF SCHEDULING, 2019, 22 (04) : 393 - 411
  • [23] Multi-objective scheduling of MapReduce jobs in big data processing
    Ibrahim Abaker Targio Hashem
    Nor Badrul Anuar
    Mohsen Marjani
    Abdullah Gani
    Arun Kumar Sangaiah
    Adewole Kayode Sakariyah
    Multimedia Tools and Applications, 2018, 77 : 9979 - 9994
  • [24] Optimization of Computing Time for Sequential MapReduce Jobs
    Li, Detian
    Gu, Tao
    Liao, Qun
    Yang, Yulu
    ICBDC 2019: PROCEEDINGS OF 2019 4TH INTERNATIONAL CONFERENCE ON BIG DATA AND COMPUTING, 2019, : 260 - 264
  • [25] Minimizing Skew in MapReduce Applications using Node Clustering in Heterogeneous Environment
    Nawale, Vishal Ankush
    Deshpande, Priya
    2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2015, : 136 - 139
  • [26] Minimizing the Makespan using Hybrid Algorithm for Cloud Computing
    Raju, R.
    Babukarthik, R. G.
    Chandramohan, D.
    Dhavachelvan, P.
    Vengattaraman, T.
    PROCEEDINGS OF THE 2013 3RD IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2013, : 957 - 962
  • [27] Dynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce
    Kurazumi, Shiori
    Tsumura, Tomoaki
    Saito, Shoichi
    Matsuo, Hiroshi
    2012 THIRD INTERNATIONAL CONFERENCE ON NETWORKING AND COMPUTING (ICNC 2012), 2012, : 288 - 292
  • [28] MapReduce short jobs optimization based on resource reuse
    Shi, Yuliang
    Zhang, Kaihui
    Cui, Lizhen
    Liu, Lei
    Zheng, Yongqing
    Zhang, Shidong
    Yu, Han
    MICROPROCESSORS AND MICROSYSTEMS, 2016, 47 : 178 - 187
  • [29] Parallel machine scheduling with splitting jobs in MapReduce system
    Huang J.-D.
    Zheng F.-F.
    Xu Y.-F.
    Liu M.
    Kongzhi yu Juece/Control and Decision, 2019, 34 (07): : 1514 - 1520
  • [30] The Impact of Capacity Scheduler Configuration Settings on MapReduce Jobs
    Chauhan, Jagmohan
    Makaroff, Dwight
    Grassmann, Winfried
    SECOND INTERNATIONAL CONFERENCE ON CLOUD AND GREEN COMPUTING / SECOND INTERNATIONAL CONFERENCE ON SOCIAL COMPUTING AND ITS APPLICATIONS (CGC/SCA 2012), 2012, : 667 - 674