Performance and Cost-Efficient Spark Job Scheduling Based on Deep Reinforcement Learning in Cloud Computing Environments

被引:55
|
作者
Islam, Muhammed Tawfiqul [1 ]
Karunasekera, Shanika [1 ]
Buyya, Rajkumar [1 ]
机构
[1] Univ Melbourne, Sch Comp & Informat Syst, Cloud Comp & Distributed Syst CLOUDS, Melbourne, Vic 3010, Australia
基金
澳大利亚研究理事会;
关键词
Sparks; Cloud computing; Costs; Task analysis; Service level agreements; Big Data; Reinforcement learning; cost-efficiency; performance improvement; deep reinforcement learning;
D O I
10.1109/TPDS.2021.3124670
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Big data frameworks such as Spark and Hadoop are widely adopted to run analytics jobs in both research and industry. Cloud offers affordable compute resources which are easier to manage. Hence, many organizations are shifting towards a cloud deployment of their big data computing clusters. However, job scheduling is a complex problem in the presence of various Service Level Agreement (SLA) objectives such as monetary cost reduction, and job performance improvement. Most of the existing research does not address multiple objectives together and fail to capture the inherent cluster and workload characteristics. In this article, we formulate the job scheduling problem of a cloud-deployed Spark cluster and propose a novel Reinforcement Learning (RL) model to accommodate the SLA objectives. We develop the RL cluster environment and implement two Deep Reinforce Learning (DRL) based schedulers in TF-Agents framework. The proposed DRL-based scheduling agents work at a fine-grained level to place the executors of jobs while leveraging the pricing model of cloud VM instances. In addition, the DRL-based agents can also learn the inherent characteristics of different types of jobs to find a proper placement to reduce both the total cluster VM usage cost and the average job duration. The results show that the proposed DRL-based algorithms can reduce the VM usage cost up to 30%.
引用
收藏
页码:1695 / 1710
页数:16
相关论文
共 50 条
  • [1] Energy-aware scheduling for spark job based on deep reinforcement learning in cloud
    Li, Hongjian
    Lu, Liang
    Shi, Wenhu
    Tan, Gangfan
    Luo, Hao
    COMPUTING, 2023, 105 (08) : 1717 - 1743
  • [2] Energy-aware scheduling for spark job based on deep reinforcement learning in cloud
    Hongjian Li
    Liang Lu
    Wenhu Shi
    Gangfan Tan
    Hao Luo
    Computing, 2023, 105 : 1717 - 1743
  • [3] Cost-based job scheduling strategy in cloud computing environments
    Mansouri, N.
    Javidi, M. M.
    DISTRIBUTED AND PARALLEL DATABASES, 2020, 38 (02) : 365 - 400
  • [4] Cost-based job scheduling strategy in cloud computing environments
    N. Mansouri
    M. M. Javidi
    Distributed and Parallel Databases, 2020, 38 : 365 - 400
  • [5] A Deep Reinforcement Learning-Based Preemptive Approach for Cost-Aware Cloud Job Scheduling
    Cheng, Long
    Wang, Yue
    Cheng, Feng
    Liu, Cheng
    Zhao, Zhiming
    Wang, Ying
    IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, 2024, 9 (03): : 422 - 432
  • [6] Cost-aware job scheduling for cloud inutances using deep reinforcement learning
    Cheng, Feng
    Huang, Yifeng
    Tanpure, Bhavana
    Sawalani, Pawan
    Cheng, Long
    Liu, Cong
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2022, 25 (01): : 619 - 631
  • [7] Cost-aware job scheduling for cloud instances using deep reinforcement learning
    Feng Cheng
    Yifeng Huang
    Bhavana Tanpure
    Pawan Sawalani
    Long Cheng
    Cong Liu
    Cluster Computing, 2022, 25 : 619 - 631
  • [8] Cost-Efficient Distributed MapReduce Job Scheduling across Cloud Federation
    Gouasmi, Thouraya
    Louati, Wajdi
    Kacem, Ahmed Hadj
    2017 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (SCC), 2017, : 289 - 296
  • [9] Cost-Efficient Workload Scheduling in Cloud Assisted Mobile Edge Computing
    Ma, Xiao
    Zhang, Shan
    Wenzhuo, L.
    Zhang, Puheng
    Lin, Chuang
    Shen, Xuemin
    2017 IEEE/ACM 25TH INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE (IWQOS), 2017,
  • [10] Cost-efficient dynamic scheduling of big data applications in apache spark on cloud
    Islam, Muhammed Tawfiqul
    Srirama, Satish Narayana
    Karunasekera, Shanika
    Buyya, Rajkumar
    JOURNAL OF SYSTEMS AND SOFTWARE, 2020, 162