Deep Reinforcement Learning with Different Rewards for Scheduling in High-Performance Computing Systems

被引：2

作者：

Reza, Md Farhadur ^{[1
]}

Zhao, Bo ^{[1
]}

机构：

[1] Univ Cent Missouri, Sch Comp Sci & Math, Warrensburg, MO 64093 USA

来源：

2021 IEEE INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS) | 2021年

关键词：

D O I：

10.1109/MWSCAS47672.2021.9531852

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Scheduling is a challenging task for high-performance computing systems since it involves complex allocations of various types of resources among jobs with different characteristics. Because incoming jobs often vary with resource requests and may interact with other jobs, heuristics based scheduling algorithms tend to be suboptimal and require substantial amount of time to design and test under diverse conditions. As a result, reinforcement learning (RL) based approaches have been proposed to tackle various job scheduling challenges. We have also used deep neural networks for approximating the decisions in RL agents as table-based RL agent is not scalable for large-scale problem sizes. The performance of RL agents, however, has proven to been notoriously instable and sensitive to training hyperparameters and the reward signal. In this work, we aim to study how different reward signals might affect the RL agents' performance. We trained RL agents with four different reward signals and simulations results under Alibaba workloads showed that trained RL agents' improve the performance for 60-65% of the jobset compared to two popular heuristics.

引用

页码：183 / 186

页数：4

共 50 条

[1] DRAS: Deep Reinforcement Learning for Cluster Scheduling in High Performance Computing
Fan, Yuping
Li, Boyang
Favorite, Dustin
Singh, Naunidh
Childers, Taylor
Rich, Paul
Allcock, William
Papka, Michael E.
Lan, Zhiling
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (12) : 4903 - 4917
[2] Optimization of High-Performance Computing Job Scheduling Based on Offline Reinforcement Learning
Li, Shihao
Dai, Wei
Chen, Yongyan
Liang, Bo
APPLIED SCIENCES-BASEL, 2024, 14 (23):
[3] GARLSched: Generative adversarial deep reinforcement learning task scheduling optimization for large-scale high performance computing systems
Li, Jingbo
Zhang, Xingjun
Wei, Jia
Ji, Zeyu
Wei, Zheng
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 135 : 259 - 269
[4] Breast histopathology with high-performance computing and deep learning
Graziani M.
Eggel I.
Deligand F.
Bobák M.
Andrearczyk V.
Müller H.
Computing and Informatics, 2021, 39 (04) : 780 - 807
[5] BREAST HISTOPATHOLOGY WITH HIGH-PERFORMANCE COMPUTING AND DEEP LEARNING
Graziani, Mara
Eggel, Ivan
Deligand, Francois
Bobak, Martin
Andrearczyk, Vincent
Mueller, Henning
COMPUTING AND INFORMATICS, 2020, 39 (04) : 780 - 807
[6] Parallel Simulation of Tasks Scheduling and Scheduling Criteria in High-performance Computing Systems
Skrinarova, Jarmila
Povinsky, Michal
JOURNAL OF INFORMATION AND ORGANIZATIONAL SCIENCES, 2019, 43 (02) : 211 - 228
[7] High-Performance UAV Crowdsensing: A Deep Reinforcement Learning Approach
Wei, Kaimin
Huang, Kai
Wu, Yongdong
Li, Zhetao
He, Hongliang
Zhang, Jilian
Chen, Jinpeng
Guo, Song
IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (19) : 18487 - 18499
[8] A deep reinforcement learning control approach for high-performance aircraft
De Marco, Agostino
D'Onza, Paolo Maria
Manfredi, Sabato
NONLINEAR DYNAMICS, 2023, 111 (18) : 17037 - 17077
[9] OKCM: improving parallel task scheduling in high-performance computing systems using online learning
Li, Jingbo
Zhang, Xingjun
Han, Li
Ji, Zeyu
Dong, Xiaoshe
Hu, Chenglong
JOURNAL OF SUPERCOMPUTING, 2021, 77 (06): : 5960 - 5983
[10] OKCM: improving parallel task scheduling in high-performance computing systems using online learning
Jingbo Li
Xingjun Zhang
Li Han
Zeyu Ji
Xiaoshe Dong
Chenglong Hu
The Journal of Supercomputing, 2021, 77 : 5960 - 5983

← 1 2 3 4 5 →