Delay-aware model-based reinforcement learning for continuous control

被引：39

作者：

Chen, Baiming ^{[1
]}

Xu, Mengdi ^{[2
]}

Li, Liang ^{[1
]}

Zhao, Ding ^{[2
]}

机构：

[1] Tsinghua Univ, Beijing 100084, Peoples R China

[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

来源：

NEUROCOMPUTING | 2021年 / 450卷

关键词：

Model-based reinforcement learning; Markov decision process; Continuous control; Delayed system; FINITE SPECTRUM ASSIGNMENT; SMITH PREDICTOR; SYSTEMS; INTEGRATOR; STABILITY; ROBOT;

D O I：

10.1016/j.neucom.2021.04.015

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Action delays degrade the performance of reinforcement learning in many real-world systems. This paper proposes a formal definition of delay-aware Markov Decision Process and proves it can be transformed into standard MDP with augmented states using the Markov reward process. We develop a delay-aware model-based reinforcement learning framework that can incorporate the multi-step delay into the learned system models without learning effort. Experiments with the Gym and MuJoCo platforms show that the proposed delay-aware model-based algorithm is more efficient in training and transferable between systems with various durations of delay compared with state-of-the-art model-free reinforce-ment learning methods. (c) 2021 Elsevier B.V. All rights reserved.

引用

页码：119 / 128

页数：10

共 46 条

[1]

[Anonymous], 1994, Proceedings of the 11th International Conference on Machine Learning, DOI DOI 10.1016/C2009-0-27542-8

[2] LINEAR-SYSTEMS WITH DELAYED CONTROLS - A REDUCTION [J].