Cost-aware job scheduling for cloud inutances using deep reinforcement learning

被引:50
作者
Cheng, Feng [1 ]
Huang, Yifeng [2 ]
Tanpure, Bhavana [3 ]
Sawalani, Pawan [3 ]
Cheng, Long [2 ,4 ]
Liu, Cong [5 ]
机构
[1] Southwest Jiaotong Univ, Sch Math, Chengdu, Peoples R China
[2] North China Elect Power Univ Beijing, Sch Control & Comp Engn, Beijing, Peoples R China
[3] Dublin City Univ, Sch Comp, Dublin, Ireland
[4] Dublin City Univ, Insight SFI Res Ctr Data Analyt, Dublin, Ireland
[5] Shandong Univ Technol, Sch Comp Sci & Technol, Zibo, Peoples R China
来源
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS | 2022年 / 25卷 / 01期
基金
美国国家科学基金会;
关键词
Cloud computing; Deep reinforcement learning; Deep Q-learning; QoS; Job scheduling; Cost optimization;
D O I
10.1007/s10586-021-03436-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As the services provided by cloud vendors are providing better performance, achieving auto-scaling, load-balancing, and optimized performance along with low infrastructure maintenance, more and more companies migrate their services to the cloud. Since the cloud workload is dynamic and complex, scheduling the jobs submitted by users in an effective way is proving to be a challenging task. Although a lot of advanced job scheduling approaches have been proposed in the past years, almost all of them are designed to handle batch jobs rather than real-time workloads, such as that user requests are submitted at any time with any amount of numbers. In this work, we have proposed a Deep Reinforcement Learning (DRL) based job scheduler that dispatches the jobs in real time to tackle this problem. Specifically, we focus on scheduling user requests in such a way as to provide the quality of service (QoS) to the end-user along with a significant reduction of the cost spent on the execution of jobs on the virtual instances. We have implemented our method by Deep Q-learning Network (DQN) model, and our experimental results demonstrate that our approach can significantly outperform the commonly used real-time scheduling algorithms.
引用
收藏
页码:619 / 631
页数:13
相关论文
共 45 条
  • [11] SLA-based virtual machine management for heterogeneous workloads in a cloud datacenter
    Garg, Saurabh Kumar
    Toosi, Adel Nadjaran
    Gopalaiyengar, Srinivasa K.
    Buyya, Rajkumar
    [J]. JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2014, 45 : 108 - 120
  • [12] An efficient task scheduling approach using moth-flame optimization algorithm for cyber-physical system applications in fog computing
    Ghobaei-Arani, Mostafa
    Souri, Alireza
    Safara, Fatemeh
    Norouzi, Monire
    [J]. TRANSACTIONS ON EMERGING TELECOMMUNICATIONS TECHNOLOGIES, 2020, 31 (02)
  • [13] ControCity: An Autonomous Approach for Controlling Elasticity Using Buffer Management in Cloud Computing Environment
    Ghobaei-Arani, Mostafa
    Souri, Alireza
    Baker, Thar
    Hussien, Aseel
    [J]. IEEE ACCESS, 2019, 7 : 106911 - 106923
  • [14] LP-WSC: a linear programming approach for web service composition in geographically distributed cloud environments
    Ghobaei-Arani, Mostafa
    Souri, Alireza
    [J]. JOURNAL OF SUPERCOMPUTING, 2019, 75 (05) : 2603 - 2628
  • [15] An autonomic approach for resource provisioning of cloud services
    Ghobaei-Arani, Mostafa
    Jabbehdari, Sam
    Pourmina, Mohammad Ali
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2016, 19 (03): : 1017 - 1036
  • [16] Optimal Planning of Clean Heating with Renewable Energy Sources
    He Jiaxing
    Chen Lei
    Xu Fei
    Min Yong
    [J]. 2017 FIRST IEEE INTERNATIONAL CONFERENCE ON ENERGY INTERNET (ICEI 2017), 2017, : 1 - 6
  • [17] Kandpal M, 2017, PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE AND ENGINEERING (CONFLUENCE 2017), P249, DOI 10.1109/CONFLUENCE.2017.7943158
  • [18] ADRL: A Hybrid Anomaly-Aware Deep Reinforcement Learning-Based Resource Scaling in Clouds
    Kardani-Moghaddam, Sara
    Buyya, Rajkumar
    Ramamohanarao, Kotagiri
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (03) : 514 - 526
  • [19] Cost and Utilization Optimization of Amazon EC2 instances
    Kokkinos, P.
    Varvarigou, T. A.
    Kretsis, A.
    Soumplis, P.
    Varvarigos, E. A.
    [J]. 2013 IEEE SIXTH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD 2013), 2013, : 518 - 525
  • [20] DCloud: Deadline-Aware Resource Allocation for Cloud Computing Jobs
    Li, Dan
    Chen, Congjie
    Guan, Junjie
    Zhang, Ying
    Zhu, Jing
    Yu, Ruozhou
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2016, 27 (08) : 2248 - 2260