TD-regularized actor-critic methods

被引：1

作者：

Simone Parisi

Voot Tangkaratt

Jan Peters

Mohammad Emtiyaz Khan

机构：

[1] Technische Unviersität Darmstadt,

[2] RIKEN Center for Advanced Intelligence Project,undefined

[3] Max-Planck-Institut für Intelligente Systeme,undefined

来源：

Machine Learning | 2019年 / 108卷

关键词：

Reinforcement learning; Actor-critic; Temporal difference;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Actor-critic methods can achieve incredible performance on difficult reinforcement learning problems, but they are also prone to instability. This is partly due to the interaction between the actor and critic during learning, e.g., an inaccurate step taken by one of them might adversely affect the other and destabilize the learning. To avoid such issues, we propose to regularize the learning objective of the actor by penalizing the temporal difference (TD) error of the critic. This improves stability by avoiding large steps in the actor update whenever the critic is highly inaccurate. The resulting method, which we call the TD-regularized actor-critic method, is a simple plug-and-play approach to improve stability and overall performance of the actor-critic methods. Evaluations on standard benchmarks confirm this. Source code can be found at https://github.com/sparisi/td-reg.

引用

页码：1467 / 1501

页数：34

共 50 条

[21] Robust Actor-Critic With Relative Entropy Regulating Actor
Cheng, Yuhu
Huang, Longyang
Chen, C. L. Philip
Wang, Xuesong
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (11) : 9054 - 9063
[22] Off-Policy Actor-Critic with Emphatic Weightings
Graves, Eric
Imani, Ehsan
Kumaraswamy, Raksha
White, Martha
JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
[23] Improving sample efficiency in Multi-Agent Actor-Critic methods
Ye, Zhenhui
Chen, Yining
Jiang, Xiaohong
Song, Guanghua
Yang, Bowei
Fan, Sheng
APPLIED INTELLIGENCE, 2022, 52 (04) : 3691 - 3704
[24] An Actor-Critic Method for Simulation-Based Optimization
Li, Kuo
Jia, Qing-Shan
Yan, Jiaqi
IFAC PAPERSONLINE, 2022, 55 (11): : 7 - 12
[25] ACRE: Actor-Critic with Reward-Preserving Exploration
Athanasios Ch. Kapoutsis
Dimitrios I. Koutras
Christos D. Korkas
Elias B. Kosmatopoulos
Neural Computing and Applications, 2023, 35 : 22563 - 22576
[26] Looking Back on the Actor-Critic Architecture
Barto, Andrew G.
Sutton, Richard S.
Anderson, Charles W.
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2021, 51 (01): : 40 - 50
[27] Improving sample efficiency in Multi-Agent Actor-Critic methods
Zhenhui Ye
Yining Chen
Xiaohong Jiang
Guanghua Song
Bowei Yang
Sheng Fan
Applied Intelligence, 2022, 52 : 3691 - 3704
[28] A multi-agent reinforcement learning using Actor-Critic methods
Li, Chun-Gui
Wang, Meng
Yuan, Qing-Neng
PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 878 - 882
[29] An actor-critic model of saccade adaptation
Manabu Inaba
Tadashi Yamazaki
BMC Neuroscience, 14 (Suppl 1)
[30] Genetic Network Programming with Actor-Critic
Hatakeyama, Hiroyuki
Mabu, Shingo
Hirasawa, Kotaro
Hu, Jinglu
JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2007, 11 (01) : 79 - 86

← 1 2 3 4 5 →