Efficient jobs scheduling approach for big data applications

被引:29
作者
Shao, Yanling [1 ,3 ]
Li, Chunlin [1 ]
Gu, Jinguang [2 ]
Zhang, Jing [1 ]
Luo, Youlong [1 ]
机构
[1] Wuhan Univ Technol, Dept Comp Sci, Wuhan 430063, Hubei, Peoples R China
[2] Wuhan Univ Sci & Technol, Hubei Prov Key Lab Intelligent Informat Proc & Re, Wuhan, Hubei, Peoples R China
[3] Nanyang Inst Technol, Coll Comp & Informat Engn, Nanyang 473000, Peoples R China
关键词
Big data; Dynamic scheduling; Energy efficiency; MapReduce; Resource allocation; MINIMIZE TARDINESS PENALTY; SPECULATIVE EXECUTION; MAPREDUCE; CONSUMPTION;
D O I
10.1016/j.cie.2018.02.006
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The MapReduce framework has become a leading scheme for processing large-scale data applications in recent years. However, big data applications executed on computer clusters require a large amount of energy, which costs a considerable fraction of the data center's overall costs. Therefore, for a data center, how to reduce the energy consumption becomes a critical issue. Although Hadoop YARN adopts fine-grained resource management schemes for job scheduling, it doesn't consider the energy saving problem. In this paper, an Energy-aware Fair Scheduling framework based on YARN (denoted as EFS) is proposed, which can effectively reduce energy consumption while meet the required Service Level Agreements (SLAs). EFS not only can schedule jobs to energy-efficiency nodes, but also can power on or off the nodes. To do so, the energy-aware dynamic capacity management with deadline-driven policy is used to allocate the resources for MapReduce tasks in terms of the average execution time of containers and users request resources. And then, Energy-aware fair based scheduling problem is modeled as multi-dimensional knapsack problem (MKP) and the energy-aware greedy algorithm (EAGA) is proposed to realize tasks fine-grained placement on energy-efficient nodes. Finally, the nodes which have been kept in idle state for the threshold duration are turned off to reduce energy costs. We perform extensive experiments on the Hadoop YARN clusters to compare the energy consumption and executing time of EFS with some state-of-the-art policies. The experimental results show that EFS can not only keep the proper number of nodes in on states to meet the computing requirements but also achieve the goal of energy savings.
引用
收藏
页码:249 / 261
页数:13
相关论文
共 47 条
  • [1] Amazon, 2016, AM EC2 VIRT SERV HOS
  • [2] [Anonymous], 2009, P 2009 C USENIX ANN
  • [3] [Anonymous], INT C P2P PAR GRID C
  • [4] [Anonymous], 2007, P LINUX S DTTAW DNTO
  • [5] [Anonymous], 2003, NEWSLETTER ACM SIGME, DOI DOI 10.1145/885651.781067
  • [6] [Anonymous], 2008, P USENIX S NETW SYST
  • [7] Apache Hadoop, 2016, MAPREDUCE TUT
  • [8] Exploiting Spatio-Temporal Tradeoffs for Energy-Aware MapReduce in the Cloud
    Cardosa, Michael
    Singh, Aameek
    Pucha, Himabindu
    Chandra, Abhishek
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2012, 61 (12) : 1737 - 1751
  • [9] Big Data: A Survey
    Chen, Min
    Mao, Shiwen
    Liu, Yunhao
    [J]. MOBILE NETWORKS & APPLICATIONS, 2014, 19 (02) : 171 - 209
  • [10] Improving MapReduce Performance Using Smart Speculative Execution Strategy
    Chen, Qi
    Liu, Cheng
    Xiao, Zhen
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2014, 63 (04) : 954 - 967