Deep Reinforcement Learning with Different Rewards for Scheduling in High-Performance Computing Systems

被引:2
|
作者
Reza, Md Farhadur [1 ]
Zhao, Bo [1 ]
机构
[1] Univ Cent Missouri, Sch Comp Sci & Math, Warrensburg, MO 64093 USA
来源
2021 IEEE INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS) | 2021年
关键词
D O I
10.1109/MWSCAS47672.2021.9531852
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Scheduling is a challenging task for high-performance computing systems since it involves complex allocations of various types of resources among jobs with different characteristics. Because incoming jobs often vary with resource requests and may interact with other jobs, heuristics based scheduling algorithms tend to be suboptimal and require substantial amount of time to design and test under diverse conditions. As a result, reinforcement learning (RL) based approaches have been proposed to tackle various job scheduling challenges. We have also used deep neural networks for approximating the decisions in RL agents as table-based RL agent is not scalable for large-scale problem sizes. The performance of RL agents, however, has proven to been notoriously instable and sensitive to training hyperparameters and the reward signal. In this work, we aim to study how different reward signals might affect the RL agents' performance. We trained RL agents with four different reward signals and simulations results under Alibaba workloads showed that trained RL agents' improve the performance for 60-65% of the jobset compared to two popular heuristics.
引用
收藏
页码:183 / 186
页数:4
相关论文
共 50 条
  • [1] DRAS: Deep Reinforcement Learning for Cluster Scheduling in High Performance Computing
    Fan, Yuping
    Li, Boyang
    Favorite, Dustin
    Singh, Naunidh
    Childers, Taylor
    Rich, Paul
    Allcock, William
    Papka, Michael E.
    Lan, Zhiling
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (12) : 4903 - 4917
  • [2] Optimization of High-Performance Computing Job Scheduling Based on Offline Reinforcement Learning
    Li, Shihao
    Dai, Wei
    Chen, Yongyan
    Liang, Bo
    APPLIED SCIENCES-BASEL, 2024, 14 (23):
  • [3] GARLSched: Generative adversarial deep reinforcement learning task scheduling optimization for large-scale high performance computing systems
    Li, Jingbo
    Zhang, Xingjun
    Wei, Jia
    Ji, Zeyu
    Wei, Zheng
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 135 : 259 - 269
  • [4] Breast histopathology with high-performance computing and deep learning
    Graziani M.
    Eggel I.
    Deligand F.
    Bobák M.
    Andrearczyk V.
    Müller H.
    Computing and Informatics, 2021, 39 (04) : 780 - 807
  • [5] BREAST HISTOPATHOLOGY WITH HIGH-PERFORMANCE COMPUTING AND DEEP LEARNING
    Graziani, Mara
    Eggel, Ivan
    Deligand, Francois
    Bobak, Martin
    Andrearczyk, Vincent
    Mueller, Henning
    COMPUTING AND INFORMATICS, 2020, 39 (04) : 780 - 807
  • [6] Parallel Simulation of Tasks Scheduling and Scheduling Criteria in High-performance Computing Systems
    Skrinarova, Jarmila
    Povinsky, Michal
    JOURNAL OF INFORMATION AND ORGANIZATIONAL SCIENCES, 2019, 43 (02) : 211 - 228
  • [7] High-Performance UAV Crowdsensing: A Deep Reinforcement Learning Approach
    Wei, Kaimin
    Huang, Kai
    Wu, Yongdong
    Li, Zhetao
    He, Hongliang
    Zhang, Jilian
    Chen, Jinpeng
    Guo, Song
    IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (19) : 18487 - 18499
  • [8] A deep reinforcement learning control approach for high-performance aircraft
    De Marco, Agostino
    D'Onza, Paolo Maria
    Manfredi, Sabato
    NONLINEAR DYNAMICS, 2023, 111 (18) : 17037 - 17077
  • [9] OKCM: improving parallel task scheduling in high-performance computing systems using online learning
    Li, Jingbo
    Zhang, Xingjun
    Han, Li
    Ji, Zeyu
    Dong, Xiaoshe
    Hu, Chenglong
    JOURNAL OF SUPERCOMPUTING, 2021, 77 (06): : 5960 - 5983
  • [10] OKCM: improving parallel task scheduling in high-performance computing systems using online learning
    Jingbo Li
    Xingjun Zhang
    Li Han
    Zeyu Ji
    Xiaoshe Dong
    Chenglong Hu
    The Journal of Supercomputing, 2021, 77 : 5960 - 5983