Performance and Cost-Efficient Spark Job Scheduling Based on Deep Reinforcement Learning in Cloud Computing Environments

被引:55
|
作者
Islam, Muhammed Tawfiqul [1 ]
Karunasekera, Shanika [1 ]
Buyya, Rajkumar [1 ]
机构
[1] Univ Melbourne, Sch Comp & Informat Syst, Cloud Comp & Distributed Syst CLOUDS, Melbourne, Vic 3010, Australia
基金
澳大利亚研究理事会;
关键词
Sparks; Cloud computing; Costs; Task analysis; Service level agreements; Big Data; Reinforcement learning; cost-efficiency; performance improvement; deep reinforcement learning;
D O I
10.1109/TPDS.2021.3124670
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Big data frameworks such as Spark and Hadoop are widely adopted to run analytics jobs in both research and industry. Cloud offers affordable compute resources which are easier to manage. Hence, many organizations are shifting towards a cloud deployment of their big data computing clusters. However, job scheduling is a complex problem in the presence of various Service Level Agreement (SLA) objectives such as monetary cost reduction, and job performance improvement. Most of the existing research does not address multiple objectives together and fail to capture the inherent cluster and workload characteristics. In this article, we formulate the job scheduling problem of a cloud-deployed Spark cluster and propose a novel Reinforcement Learning (RL) model to accommodate the SLA objectives. We develop the RL cluster environment and implement two Deep Reinforce Learning (DRL) based schedulers in TF-Agents framework. The proposed DRL-based scheduling agents work at a fine-grained level to place the executors of jobs while leveraging the pricing model of cloud VM instances. In addition, the DRL-based agents can also learn the inherent characteristics of different types of jobs to find a proper placement to reduce both the total cluster VM usage cost and the average job duration. The results show that the proposed DRL-based algorithms can reduce the VM usage cost up to 30%.
引用
收藏
页码:1695 / 1710
页数:16
相关论文
共 50 条
  • [41] Implementing an intelligent learning-based algorithm for efficient task scheduling in cloud computing environments
    Ahmed, Mohammed Waseem
    Kavitha, G.
    INFORMATION SECURITY JOURNAL, 2025,
  • [42] Workflow scheduling based on deep reinforcement learning in the cloud environment
    Tingting Dong
    Fei Xue
    Chuangbai Xiao
    Jiangjiang Zhang
    Journal of Ambient Intelligence and Humanized Computing, 2021, 12 : 10823 - 10835
  • [43] Deep Reinforcement Learning for Job Scheduling on Cluster
    Yao, Zhenjie
    Chen, Lan
    Zhang, He
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT IV, 2021, 12894 : 613 - 624
  • [44] Deep Reinforcement Learning based Energy Scheduling for Edge Computing
    Yang, Qinglin
    Li, Peng
    2020 IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD 2020), 2020, : 175 - 180
  • [45] Workflow scheduling based on deep reinforcement learning in the cloud environment
    Dong, Tingting
    Xue, Fei
    Xiao, Chuangbai
    Zhang, Jiangjiang
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 12 (12) : 10823 - 10835
  • [46] Trimmer: Cost-Efficient Deep Learning Auto-tuning for Cloud Datacenters
    Borowiec, Damian
    Yeung, Gingfung
    Friday, Adrian
    Harper, Richard H. R.
    Garraghan, Peter
    2022 IEEE 15TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (IEEE CLOUD 2022), 2022, : 374 - 384
  • [47] An Efficient Multi Queue Job Scheduling for Cloud Computing
    Karthick, A. V.
    Ramaraj, E.
    Subramanian, R. Ganapathy
    2014 WORLD CONGRESS ON COMPUTING AND COMMUNICATION TECHNOLOGIES (WCCCT 2014), 2014, : 164 - +
  • [48] Energy efficient task scheduling based on deep reinforcement learning in cloud environment: A specialized review
    Hou, Huanhuan
    Jawaddi, Siti Nuraishah Agos
    Ismail, Azlan
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 151 : 214 - 231
  • [49] A deep reinforcement learning based hybrid algorithm for efficient resource scheduling in edge computing environment
    Xue, Fei
    Hai, Qiuru
    Dong, Tingting
    Cui, Zhihua
    Gong, Yuelu
    INFORMATION SCIENCES, 2022, 608 : 362 - 374
  • [50] Random task scheduling scheme based on reinforcement learning in cloud computing
    Peng, Zhiping
    Cui, Delong
    Zuo, Jinglong
    Li, Qirui
    Xu, Bo
    Lin, Weiwei
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2015, 18 (04): : 1595 - 1607