MapReduce service provisioning for frequent big data jobs on clouds considering data transfers

被引:1
作者
Nabavinejad, Seyed Morteza [1 ,2 ]
Goudarzi, Maziar [1 ]
Abedi, Saeed [1 ,3 ]
机构
[1] Sharif Univ Technol, Dept Comp Engn, Tehran, Iran
[2] Inst Res Fundamental Sci IPM, Sch Comp Sci, Tehran, Iran
[3] Univ Penn, Dept Comp & Informat Sci, Philadelphia, PA 19104 USA
关键词
Big data; MapReduce; Cloud computing; Hadoop; Energy efficiency;
D O I
10.1016/j.compeleceng.2018.08.005
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Many companies regularly run Big Data analysis, and need to optimize their resource usage considering cost, deadline, and environmental impact simultaneously. The cloud allows choosing from various virtual machines (VM) where the number and type of VMs affect the outcome such as the time for data placement and data shuffle phases, a task's energy consumption and execution time, and the makespan of jobs. We provide provisioning and scheduling algorithms to minimize environmental impact, considering the above factors, for frequently executed MapReduce jobs. To mathematically model the problem and obtain the optimal solution, we present an Integer Linear Programming (ILP) model and then continue with two heuristic algorithms. We compare proposed algorithms against a number of rivals using extensive simulations based on publicly available real-world data. The results demonstrate that our algorithms can achieve near-optimal solutions, e.g., sometime even within 0.39% of the optimal solution obtained by ILP regarding energy consumption.
引用
收藏
页码:594 / 610
页数:17
相关论文
共 20 条
  • [1] MapReduce with communication overlap (MaRCO)
    Ahmad, Faraz
    Lee, Seyong
    Thottethodi, Mithuna
    Vijaykumar, T. N.
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2013, 73 (05) : 608 - 620
  • [2] Ahmadand F, 2012, TR1211 PURD U
  • [3] MapReduce Scheduling for Deadline-Constrained Jobs in Heterogeneous Cloud Computing Systems
    Chen, Chien-Hung
    Lin, Jenn-Wei
    Kuo, Sy-Yen
    [J]. IEEE TRANSACTIONS ON CLOUD COMPUTING, 2018, 6 (01) : 127 - 140
  • [4] LIBRA: Lightweight Data Skew Mitigation in MapReduce
    Chen, Qi
    Yao, Jinyu
    Xiao, Zhen
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2015, 26 (09) : 2520 - 2533
  • [5] Towards Energy Efficiency in Heterogeneous Hadoop Clusters by Adaptive Task Assignment
    Cheng, Dazhao
    Lama, Palden
    Jiang, Changjun
    Zhou, Xiaobo
    [J]. 2015 IEEE 35TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, 2015, : 359 - 368
  • [6] Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
  • [7] FARMS: Efficient mapreduce speculation for failure recovery in short jobs
    Fu, Huansong
    Chen, Haiquan
    Zhu, Yue
    Yu, Weikuan
    [J]. PARALLEL COMPUTING, 2017, 61 : 68 - 82
  • [8] Gantz J., 2012, IDC IVIEW IDC ANAL F, V2007, P1
  • [9] Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can
    Jalaparti, Virajith
    Bodik, Peter
    Menache, Ishai
    Rao, Sriram
    Makarychev, Konstantin
    Caesar, Matthew
    [J]. SIGCOMM'15: PROCEEDINGS OF THE 2015 ACM CONFERENCE ON SPECIAL INTEREST GROUP ON DATA COMMUNICATION, 2015, : 407 - 420
  • [10] Krishnan Bhavani, 2010, Performance Evaluation Review, V38, P56, DOI 10.1145/1925019.1925031