Performance and Cost-Efficient Spark Job Scheduling Based on Deep Reinforcement Learning in Cloud Computing Environments

被引:55
|
作者
Islam, Muhammed Tawfiqul [1 ]
Karunasekera, Shanika [1 ]
Buyya, Rajkumar [1 ]
机构
[1] Univ Melbourne, Sch Comp & Informat Syst, Cloud Comp & Distributed Syst CLOUDS, Melbourne, Vic 3010, Australia
基金
澳大利亚研究理事会;
关键词
Sparks; Cloud computing; Costs; Task analysis; Service level agreements; Big Data; Reinforcement learning; cost-efficiency; performance improvement; deep reinforcement learning;
D O I
10.1109/TPDS.2021.3124670
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Big data frameworks such as Spark and Hadoop are widely adopted to run analytics jobs in both research and industry. Cloud offers affordable compute resources which are easier to manage. Hence, many organizations are shifting towards a cloud deployment of their big data computing clusters. However, job scheduling is a complex problem in the presence of various Service Level Agreement (SLA) objectives such as monetary cost reduction, and job performance improvement. Most of the existing research does not address multiple objectives together and fail to capture the inherent cluster and workload characteristics. In this article, we formulate the job scheduling problem of a cloud-deployed Spark cluster and propose a novel Reinforcement Learning (RL) model to accommodate the SLA objectives. We develop the RL cluster environment and implement two Deep Reinforce Learning (DRL) based schedulers in TF-Agents framework. The proposed DRL-based scheduling agents work at a fine-grained level to place the executors of jobs while leveraging the pricing model of cloud VM instances. In addition, the DRL-based agents can also learn the inherent characteristics of different types of jobs to find a proper placement to reduce both the total cluster VM usage cost and the average job duration. The results show that the proposed DRL-based algorithms can reduce the VM usage cost up to 30%.
引用
收藏
页码:1695 / 1710
页数:16
相关论文
共 50 条
  • [31] Multi Objective Prioritized Workflow Scheduling Using Deep Reinforcement Based Learning in Cloud Computing
    Mangalampalli, Sudheer
    Hashmi, Syed Shakeel
    Gupta, Amit
    Karri, Ganesh Reddy
    Rajkumar, K. Varada
    Chakrabarti, Tulika
    Chakrabarti, Prasun
    Margala, Martin
    IEEE ACCESS, 2024, 12 : 5373 - 5392
  • [32] DRL-based and Bsld-Aware Job Scheduling for Apache Spark Cluster in Hybrid Cloud Computing Environments
    Shi, Wenhu
    Li, Hongjian
    Zeng, Hang
    JOURNAL OF GRID COMPUTING, 2022, 20 (04)
  • [33] Multi Objective Prioritized Workflow Scheduling Using Deep Reinforcement Based Learning in Cloud Computing
    Mangalampalli, Sudheer
    Hashmi, Syed Shakeel
    Gupta, Amit
    Karri, Ganesh Reddy
    Rajkumar, K. Varada
    Chakrabarti, Tulika
    Chakrabarti, Prasun
    Margala, Martin
    IEEE Access, 2024, 12 : 5373 - 5392
  • [34] DRL-based and Bsld-Aware Job Scheduling for Apache Spark Cluster in Hybrid Cloud Computing Environments
    Wenhu Shi
    Hongjian Li
    Hang Zeng
    Journal of Grid Computing, 2022, 20
  • [35] Deep reinforcement learning-based algorithms selectors for the resource scheduling in hierarchical Cloud computing
    Zhou G.
    Wen R.
    Tian W.
    Buyya R.
    Journal of Network and Computer Applications, 2022, 208
  • [36] Job Shop Scheduling Problem Based on Deep Reinforcement Learning
    Li, Baoshuai
    Ye, Chunming
    Computer Engineering and Applications, 2024, 57 (23) : 248 - 254
  • [37] Cost-Efficient Computation Offloading in VEC Using Deep Reinforcement Learning Techniques
    Wang, Bingxin
    Tu, Dan
    Wang, Jie
    20TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE, IWCMC 2024, 2024, : 296 - 300
  • [38] Introducing an improved deep reinforcement learning algorithm for task scheduling in cloud computing
    Salari-Hamzehkhani, Behnam
    Akbari, Mehdi
    Safi-Esfahani, Faramarz
    JOURNAL OF SUPERCOMPUTING, 2025, 81 (01):
  • [39] Deep Reinforcement Learning for Dynamic Task Scheduling in Edge-Cloud Environments
    Rani, D. Mamatha
    Supreethi, K. P.
    Jayasingh, Bipin Bihari
    INTERNATIONAL JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING SYSTEMS, 2024, 15 (10) : 837 - 850
  • [40] A reinforcement learning based job scheduling algorithm for heterogeneous computing environment
    Song, Yutao
    Li, Chen
    Tian, Lihua
    Song, Hui
    COMPUTERS & ELECTRICAL ENGINEERING, 2023, 107