Geometric deep reinforcement learning for dynamic DAG scheduling

被引:0
作者
Grinsztajn, Nathan [1 ]
Beaumont, Olivier [2 ]
Jeannot, Emmanuel [3 ]
Preux, Philippe [1 ]
机构
[1] Univ Lille, CNRS, UMR 9189 CRIStAL, INRIA, Lille, France
[2] Inria Bordeaux, Hiepacs Team, Bordeaux, France
[3] Inria Bordeaux, TADaaM Team, Bordeaux, France
来源
2020 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI) | 2020年
关键词
Reinforcement learning; scheduling; task graph; DAG; high performance computing; combinatorial optimization;
D O I
10.1109/ssci47803.2020.9308278
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In practice, it is quite common to face combinatorial optimization problems which contain uncertainly along with non determinism and dynamicity. These three properties call for appropriate algorithms; reinforcement learning (RL) is dealing with them in a very natural way. Today, despite some efforts, most real-life combinatorial optimization problems remain out of the reach of reinforcement learning algorithms. In this paper, we propose a reinforcement learning approach to solve a realistic scheduling problem, and apply it to an algorithm commonly executed in the high performance computing community, the CHOLESKY factorization. On the contrary to static scheduling, where tasks are assigned to processors in a predetermined ordering before the beginning of the parallel execution, our method is dynamic: task allocations and their execution ordering are decided at runtime, based on the system state and unexpected events, which allows much more flexibility. To do so, our algorithm uses graph neural networks in combination with an actor critic algorithm (A2C) to build an adaptive representation of the problem on the fly. We show that this approach is competitive with state-of-the-art heuristics used in high performance computing runtime systems. Moreover, our algorithm does not require an explicit model of the environment, but we demonstrate that extra knowledge can easily he incorporated and improves the performance. We also exhibit key properties provided by this RL approach, and study its transfer abilities to other instances.
引用
收藏
页码:258 / 265
页数:8
相关论文
共 32 条
[1]  
Abe Kenshin, 2019, ARXIV190511623
[2]  
Addanki R., 2019, ARXIV PREPRINT ARXIV
[3]   Are Static Schedules so Bad ? A Case Study on Cholesky Factorization [J].
Agullo, Emmanuel ;
Beaumont, Olivier ;
Eyraud-Dubois, Lionel ;
Kumar, Suraj .
2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2016), 2016, :1021-1030
[4]  
Agullo Emmanuel, 2010, S APPL ACC HIGH PERF
[5]  
[Anonymous], 2018, PROC AAAI C ARTIF IN
[6]  
[Anonymous], 1979, Computers and intractability
[7]   StarPU: a unified platform for task scheduling on heterogeneous multicore architectures [J].
Augonnet, Cedric ;
Thibault, Samuel ;
Namyst, Raymond ;
Wacrenier, Pierre-Andre .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2011, 23 (02) :187-198
[8]  
Beaumont Olivier, 2020, P 26 INT EUR C PAR D, P1
[9]  
Bengio Y., 2018, Machine learning for combinatorial optimization: a methodological tour d'horizon
[10]   PaRSRC: Exploiting Heterogeneity to Enhance Scalability [J].
Bosilca, George ;
Bouteiller, Aurelien ;
Danalis, Anthony ;
Faverge, Mathieu ;
Herault, Thomas ;
Dongarra, Jack J. .
COMPUTING IN SCIENCE & ENGINEERING, 2013, 15 (06) :36-45