TD-regularized actor-critic methods

被引：1

作者：

Simone Parisi

Voot Tangkaratt

Jan Peters

Mohammad Emtiyaz Khan

机构：

[1] Technische Unviersität Darmstadt,

[2] RIKEN Center for Advanced Intelligence Project,undefined

[3] Max-Planck-Institut für Intelligente Systeme,undefined

来源：

Machine Learning | 2019年 / 108卷

关键词：

Reinforcement learning; Actor-critic; Temporal difference;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Actor-critic methods can achieve incredible performance on difficult reinforcement learning problems, but they are also prone to instability. This is partly due to the interaction between the actor and critic during learning, e.g., an inaccurate step taken by one of them might adversely affect the other and destabilize the learning. To avoid such issues, we propose to regularize the learning objective of the actor by penalizing the temporal difference (TD) error of the critic. This improves stability by avoiding large steps in the actor update whenever the critic is highly inaccurate. The resulting method, which we call the TD-regularized actor-critic method, is a simple plug-and-play approach to improve stability and overall performance of the actor-critic methods. Evaluations on standard benchmarks confirm this. Source code can be found at https://github.com/sparisi/td-reg.

引用

页码：1467 / 1501

页数：34

共 50 条

[1] TD-regularized actor-critic methods
Parisi, Simone
Tangkaratt, Voot
Peters, Jan
Khan, Mohammad Emtiyaz
MACHINE LEARNING, 2019, 108 (8-9) : 1467 - 1501
[2] Error controlled actor-critic
Gao, Xingen
Chao, Fei
Zhou, Changle
Ge, Zhen
Yang, Longzhi
Chang, Xiang
Shang, Changjing
Shen, Qiang
INFORMATION SCIENCES, 2022, 612 : 62 - 74
[3] Master-Slave Policy Collaboration for Actor-Critic Methods
Li, Xiaomu
Liu, Quan
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[4] Natural Actor-Critic
Peters, Jan
Schaal, Stefan
NEUROCOMPUTING, 2008, 71 (7-9) : 1180 - 1190
[5] On actor-critic algorithms
Konda, VR
Tsitsiklis, JN
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2003, 42 (04) : 1143 - 1166
[6] Efficient Model Learning Methods for Actor-Critic Control
Grondman, Ivo
Vaandrager, Maarten
Busoniu, Lucian
Babuska, Robert
Schuitema, Erik
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2012, 42 (03): : 591 - 602
[7] Variational actor-critic algorithms*,**
Zhu, Yuhua
Ying, Lexing
ESAIM-CONTROL OPTIMISATION AND CALCULUS OF VARIATIONS, 2023, 29
[8] Multi-actor mechanism for actor-critic reinforcement learning
Li, Lin
Li, Yuze
Wei, Wei
Zhang, Yujia
Liang, Jiye
INFORMATION SCIENCES, 2023, 647
[9] The Effect of Discounting Actor-loss in Actor-Critic Algorithm
Yaputra, Jordi
Suyanto, Suyanto
2021 4TH INTERNATIONAL SEMINAR ON RESEARCH OF INFORMATION TECHNOLOGY AND INTELLIGENT SYSTEMS (ISRITI 2021), 2020,
[10] Noisy Importance Sampling Actor-Critic: An Off-Policy Actor-Critic With Experience Replay
Tasfi, Norman
Capretz, Miriam
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,

← 1 2 3 4 5 →