Orchestrating an Ensemble of MapReduce Jobs for Minimizing Their Makespan

被引:49
|
作者
Verma, Abhishek [1 ]
Cherkasova, Ludmila [2 ]
Campbell, Roy H. [1 ]
机构
[1] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
[2] Hewlett Packard Labs, Palo Alto, CA 94304 USA
关键词
MapReduce; Hadoop; batch workloads; optimized schedule; minimized makespan; SYSTEMS;
D O I
10.1109/TDSC.2013.14
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Cloud computing offers an attractive option for businesses to rent a suitable size MapReduce cluster, consume resources as a service, and pay only for resources that were consumed. A key challenge in such environments is to increase the utilization of MapReduce clusters to minimize their cost. One way of achieving this goal is to optimize the execution of Mapreduce jobs on the cluster. For a set of production jobs that are executed periodically on new data, we can perform an offline analysis for evaluating performance benefits of different optimization techniques. In this work, we consider a subset of production workloads that consists of MapReduce jobs with no dependencies. We observe that the order in which these jobs are executed can have a significant impact on their overall completion time and the cluster resource utilization. Our goal is to automate the design of a job schedule that minimizes the completion time (makespan) of such a set of MapReduce jobs. We introduce a simple abstraction where each MapReduce job is represented as a pair of map and reduce stage durations. This representation enables us to apply the classic Johnson algorithm that was designed for building an optimal two-stage job schedule. We evaluate the performance benefits of the constructed schedule through an extensive set of simulations over a variety of realistic workloads. The results are workload and cluster-size dependent, but it is typical to achieve up to 10-25 percent of makespan improvements by simply processing the jobs in the right order. However, in some cases, the simplified abstraction assumed by Johnson's algorithm may lead to a suboptimal job schedule. We design a novel heuristic, called BalancedPools, that significantly improves Johnson's schedule results (up to 15-38 percent), exactly in the situations when it produces suboptimal makespan. Overall, we observe up to 50 percent in the makespan improvements with the new BalancedPools algorithm. The results of our simulation study are validated through experiments on a 66-node Hadoop cluster.
引用
收藏
页码:314 / 327
页数:14
相关论文
共 50 条
  • [41] MRTune: A Simulator for Performance Tuning of MapReduce Jobs with Skewed Data
    Zhou, Xibo
    Luo, Wuman
    Tan, Haoyu
    2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2014, : 352 - 359
  • [42] Shuffle Scheduling for MapReduce Jobs Based on Periodic Network Status
    Fan, Yuqi
    Liu, Wenlong
    Guo, Dan
    Wu, Weili
    Du, Dingzhu
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2020, 28 (04) : 1832 - 1844
  • [43] FP-Hadoop: Efficient processing of skewed MapReduce jobs
    Liroz-Gistau, Miguel
    Akbarinia, Reza
    Agrawal, Divyakant
    Valduriez, Patrick
    INFORMATION SYSTEMS, 2016, 60 : 69 - 84
  • [44] Joint scheduling of MapReduce jobs with servers: Performance bounds and experiments
    Ling, Xiao
    Yuan, Yi
    Wang, Dan
    Liu, Jiangchuan
    Yang, Jiahai
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2016, 90-91 : 52 - 66
  • [45] Makespan-Minimizing Heterogeneous Task Allocation under Temporal Constraints
    Jeong, Byeong-Min
    Oh, Yun-Seo
    Jang, Dae-Sung
    Hwang, Nam-Eung
    Kim, Joon-Won
    Choi, Han-Lim
    AEROSPACE, 2023, 10 (12)
  • [46] Energy-Aware Scheduling of MapReduce Jobs for Big Data Applications
    Mashayekhy, Lena
    Nejad, Mahyar Movahed
    Grosu, Daniel
    Zhang, Quan
    Shi, Weisong
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2015, 26 (10) : 2720 - 2733
  • [47] MASKED: A MapReduce Solution for the Kappa-pruned Ensemble-based Anomaly Detection System
    Islam, Md. Shariful
    Sabor, Korosh Koochekian
    Hamou-Lhadj, Wahab
    Trabelsi, Abdelaziz
    Alawneh, Luay
    2018 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY (QRS 2018), 2018, : 25 - 34
  • [48] Big data classification using heterogeneous ensemble classifiers in Apache Spark based on MapReduce paradigm
    Kadkhodaei, Hamidreza
    Moghadam, Amir Masoud Eftekhari
    Dehghan, Mehdi
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 183
  • [49] An optimization framework for the capacity allocation and admission control of MapReduce jobs in cloud systems
    M. Malekimajd
    D. Ardagna
    M. Ciavotta
    E. Gianniti
    M. Passacantando
    A. M. Rizzi
    The Journal of Supercomputing, 2018, 74 : 5314 - 5348
  • [50] An optimization framework for the capacity allocation and admission control of MapReduce jobs in cloud systems
    Malekimajd, M.
    Ardagna, D.
    Ciavotta, M.
    Gianniti, E.
    Passacantando, M.
    Rizzi, A. M.
    JOURNAL OF SUPERCOMPUTING, 2018, 74 (10): : 5314 - 5348