Robustness challenges in Reinforcement Learning based time-critical cloud resource scheduling: A Meta-Learning based solution

被引:8
作者
Liu, Hongyun [1 ,2 ]
Chen, Peng [3 ]
Ouyang, Xue [4 ]
Gao, Hui [5 ]
Yan, Bing [6 ]
Grosso, Paola [1 ]
Zhao, Zhiming [1 ]
机构
[1] Univ Amsterdam, Informat Inst, NL-1098 XH Amsterdam, Netherlands
[2] Univ Amsterdam, Grad Sch Informat, NL-1098 XH Amsterdam, Netherlands
[3] Xihua Univ, Sch Comp & Software Engn, Chengdu 610039, Peoples R China
[4] Natl Univ Def Technol, Sch Comp Sci, Changsha 410073, Peoples R China
[5] Shaanxi Univ Sci & Technol, Coll Elect & Control Engn, Xian 710021, Peoples R China
[6] Univ Adelaide, Sch Elect & Elect Engn, Adelaide, SA 5005, Australia
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2023年 / 146卷
基金
中国国家自然科学基金;
关键词
Robustness; Reinforcement Learning; Meta Learning; Resource management; Task scheduling; Cloud computing; MANAGEMENT;
D O I
10.1016/j.future.2023.03.029
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Cloud computing attracts increasing attention in processing dynamic computing tasks and automating the software development and operation pipeline. In many cases, the computing tasks have strict deadlines. The cloud resource manager (e.g., orchestrator) effectively manages the resources and provides tasks Quality of Service (QoS). Cloud task scheduling is tricky due to the dynamic nature of task workload and resource availability. Reinforcement Learning (RL) has attracted lots of research attention in scheduling. However, those RL-based approaches suffer from low scheduling performance robustness when the task workload and resource availability change, particularly when handling timecritical tasks. This paper focuses on both challenges of robustness and deadline guarantee among such RL, specifically Deep RL (DRL)-based scheduling approaches. We quantify the robustness measurements as the retraining time and investigate how to improve both robustness and deadline guarantee of DRL-based scheduling. We propose MLR-TC-DRLS, a practical, robust Meta Deep Reinforcement Learning-based scheduling solution to provide time-critical tasks deadline guarantee and fast adaptation under highly dynamic situations. We comprehensively evaluate MLR-TC-DRLS performance against RL-based and RL advanced variants-based scheduling approaches using real-world and synthetic data. The evaluations validate that our proposed approach improves the scheduling performance robustness of typical DRL variants scheduling approaches with 97%-98.5% deadline guarantees and 200%-500% faster adaptation.
引用
收藏
页码:18 / 33
页数:16
相关论文
共 50 条
[31]   Dejavu: Reinforcement Learning-based Cloud Scheduling with Demonstration and Competition [J].
Kim, Seonwoo ;
Nam, Yoonsung ;
Park, Minwoo ;
Lee, Heewon ;
Kim, Seyeon ;
Ha, Sangtae .
2024 IEEE 21ST INTERNATIONAL CONFERENCE ON MOBILE AD-HOC AND SMART SYSTEMS, MASS 2024, 2024, :469-478
[32]   Curriculum-Based Meta-learning [J].
Zhang, Ji ;
Song, Jingkuan ;
Yao, Yazhou ;
Gao, Lianli .
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, :1838-1846
[33]   Container-Based Microservice Scheduling Using Reinforcement Learning in Distributed Cloud Computing [J].
Marques Matos, Gabriel Henrique ;
Carvalho, Marcos ;
Macedo, Daniel F. .
2024 IEEE LATIN-AMERICAN CONFERENCE ON COMMUNICATIONS, LATINCOM, 2024,
[34]   A Multi-object Optimization Cloud Workflow Scheduling Algorithm Based on Reinforcement Learning [J].
Wu Jiahao ;
Peng Zhiping ;
Cui Delong ;
Li Qirui ;
He Jieguang .
INTELLIGENT COMPUTING THEORIES AND APPLICATION, PT II, 2018, 10955 :550-559
[35]   DRLBTSA: Deep reinforcement learning based task-scheduling algorithm in cloud computing [J].
Mangalampalli, Sudheer ;
Karri, Ganesh Reddy ;
Kumar, Mohit ;
Khalaf, Osama Ibrahim ;
Romero, Carlos Andres Tavera ;
Sahib, GhaidaMuttashar Abdul .
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (03) :8359-8387
[36]   Hyper-Heuristic Task Scheduling Algorithm Based on Reinforcement Learning in Cloud Computing [J].
Yin, Lei ;
Sun, Chang ;
Gao, Ming ;
Fang, Yadong ;
Li, Ming ;
Zhou, Fengyu .
INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 37 (02) :1587-1608
[37]   Scheduling framework based on reinforcement learning in online-offline colocated cloud environment [J].
Ma L. ;
Fan Q. ;
Xu T. ;
Guo G. ;
Zhang S. ;
Sun Y. ;
Zhang Y. .
Tongxin Xuebao/Journal on Communications, 2023, 44 (06) :90-102
[38]   Cloud Job Scheduling Control Scheme Based on Gaussian Process Regression and Reinforcement Learning [J].
Peng, Zhiping ;
Cui, Delong ;
Xiong, Jianbin ;
Xu, Bo ;
Ma, Yuanjia ;
Lin, Weiwei .
2016 IEEE 4TH INTERNATIONAL CONFERENCE ON FUTURE INTERNET OF THINGS AND CLOUD (FICLOUD 2016), 2016, :278-286
[39]   DRLBTSA: Deep reinforcement learning based task-scheduling algorithm in cloud computing [J].
Sudheer Mangalampalli ;
Ganesh Reddy Karri ;
Mohit Kumar ;
Osama Ibrahim Khalaf ;
Carlos Andres Tavera Romero ;
GhaidaMuttashar Abdul Sahib .
Multimedia Tools and Applications, 2024, 83 :8359-8387
[40]   Improving the Robustness of Instance-Based Reinforcement Learning Robots by Metalearning [J].
Yasuda, Toshiyuki ;
Araki, Kousuke ;
Ohkura, Kazuhiro .
JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2011, 15 (08) :1065-1072