Dejavu: Reinforcement Learning-based Cloud Scheduling with Demonstration and Competition

被引：0

作者：

Kim, Seonwoo ^{[1
]}

Nam, Yoonsung ^{[2
]}

Park, Minwoo ^{[2
]}

Lee, Heewon ^{[2
]}

Kim, Seyeon ^{[1
]}

Ha, Sangtae ^{[1
]}

机构：

[1] Univ Colorado, Boulder, CO 80309 USA

[2] Samsung Elect, Suwon, South Korea

来源：

2024 IEEE 21ST INTERNATIONAL CONFERENCE ON MOBILE AD-HOC AND SMART SYSTEMS, MASS 2024 | 2024年

基金：

美国国家科学基金会;

关键词：

Container-based cloud; Scheduling; Reinforcement learning; Offline RL;

D O I：

10.1109/MASS62177.2024.00068

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As Cloud's adoption surges across industries, the limitations of its default scheduler, particularly on large scales or for jobs outside of its initial design scope, have become increasingly prominent. While the default schedulers in various cloud platforms were primarily engineered to focus on simple and predictable tasks, reinforcement learning (RL)-based schedulers are attracting attention as they can predict a larger and more diverse cloud environment. Nevertheless, there are practical constraints to the use of RL. Retraining for adaptation is necessary for each new environment, and exploration taken during each training may lead to unexpected performance degradation at runtime. To address these issues, this paper presents Dejavu which combines reinforcement learning with neural networks to learn and resolve scheduling problems more effectively. To tackle the extended training time and performance degradation by unexpected explorations, we apply pretraining using Demonstrations from existing heuristics. This guides the RL agent to explore in a safe and efficient manner. Furthermore, we design a robust reward function to push Dejavu to compete with and eventually outperform, the exploited heuristics and other baselines. The experimental results demonstrate the efficacy of Dejavu, showing remarkable improvements in key metrics. Compared to the default scheduler, it boosts resource utilization by 6% and shortens scheduling time by 3% during the scheduling period.

引用

页码：469 / 478

页数：10

共 21 条

[1]

Abbeel Pieter, 2010, Inverse Reinforcement Learning, P554

[2]

Bao YX, 2019, IEEE INFOCOM SER, P505, DOI [10.1109/INFOCOM.2019.8737460, 10.1109/infocom.2019.8737460]

[3]

Fan Yuping, 2021, 2021 IEEE INT PAR DI

[4] Multi-Resource Packing for Cluster Schedulers [J].

Grandl, Robert ;

Ananthanarayanan, Ganesh ;

Kandula, Srikanth ;

Rao, Sriram ;

Akella, Aditya .

ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2014, 44 (04) :455-466

[5] Who Limits the Resource Efficiency of My Datacenter: An Analysis of Alibaba Datacenter Traces [J].

Guo, Jing ;

Chang, Zihao ;

Wang, Sa ;

Ding, Haiyang ;

Feng, Yihui ;

Mao, Liang ;

Bao, Yungang .

PROCEEDINGS OF THE IEEE/ACM INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE (IWQOS 2019), 2019,

[6]

Hester T, 2018, AAAI CONF ARTIF INTE, P3223

[7]

Ho J, 2016, ADV NEUR IN, V29

[8] RLSK: A Job Scheduler for Federated Kubernetes Clusters based on Reinforcement Learning [J].

Huang, Jiaming ;

Xiao, Chuming ;

Wu, Weigang .

2020 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING (IC2E 2020), 2020, :116-123

[9] Resource Management with Deep Reinforcement Learning [J].

Mao, Hongzi ;

Alizadeh, Mohammad ;

Menache, Ishai ;

Kandula, Srikanth .

PROCEEDINGS OF THE 15TH ACM WORKSHOP ON HOT TOPICS IN NETWORKS (HOTNETS '16), 2016, :50-56

[10]

Mnih V, 2016, PR MACH LEARN RES, V48

← 1 2 3 →