TD-regularized actor-critic methods

被引：1

作者：

Simone Parisi

Voot Tangkaratt

Jan Peters

Mohammad Emtiyaz Khan

机构：

[1] Technische Unviersität Darmstadt,

[2] RIKEN Center for Advanced Intelligence Project,undefined

[3] Max-Planck-Institut für Intelligente Systeme,undefined

来源：

Machine Learning | 2019年 / 108卷

关键词：

Reinforcement learning; Actor-critic; Temporal difference;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Actor-critic methods can achieve incredible performance on difficult reinforcement learning problems, but they are also prone to instability. This is partly due to the interaction between the actor and critic during learning, e.g., an inaccurate step taken by one of them might adversely affect the other and destabilize the learning. To avoid such issues, we propose to regularize the learning objective of the actor by penalizing the temporal difference (TD) error of the critic. This improves stability by avoiding large steps in the actor update whenever the critic is highly inaccurate. The resulting method, which we call the TD-regularized actor-critic method, is a simple plug-and-play approach to improve stability and overall performance of the actor-critic methods. Evaluations on standard benchmarks confirm this. Source code can be found at https://github.com/sparisi/td-reg.

引用

页码：1467 / 1501

页数：34

共 50 条

[41] Real-Time 'Actor-Critic' Tracking
Chen, Boyu
Wang, Dong
Li, Peixia
Wang, Shuang
Lu, Huchuan
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 328 - 345
[42] Bayesian Policy Gradient and Actor-Critic Algorithms
Ghavamzadeh, Mohammad
Engel, Yaakov
Valko, Michal
JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
[43] A fuzzy Actor-Critic reinforcement learning network
Wang, Xue-Song
Cheng, Yu-Hu
Yi, Jian-Qiang
INFORMATION SCIENCES, 2007, 177 (18) : 3764 - 3781
[44] Actor-critic with familiarity-based trajectory experience replay
Gong, Xiaoyu
Yu, Jiayu
Lu, Shuai
Lu, Hengwei
INFORMATION SCIENCES, 2022, 582 : 633 - 647
[45] A Soft Actor-Critic Algorithm for Sequential Recommendation
Hong, Hyejin
Kimurn, Yusuke
Hatano, Kenji
DATABASE AND EXPERT SYSTEMS APPLICATIONS, PT I, DEXA 2024, 2024, 14910 : 258 - 266
[46] Classical Actor-Critic Applied to the Control of a Self - Regulatory Process
Bras, E. H.
Louw, T. M.
Bradshaw, S. M.
IFAC PAPERSONLINE, 2023, 56 (02): : 7172 - 7177
[47] A Sample-Efficient Actor-Critic Algorithm for Recommendation Diversification
Li, Shuang
Yan, Yanghui
Ren, Ju
Zhou, Yuezhi
Zhang, Yaoxue
CHINESE JOURNAL OF ELECTRONICS, 2020, 29 (01) : 89 - 96
[48] Asymmetric Actor-Critic for Adapting to Changing Environments in Reinforcement Learning
Yue, Wangyang
Zhou, Yuan
Zhang, Xiaochuan
Hua, Yuchen
Li, Minne
Fan, Zunlin
Wang, Zhiyuan
Kou, Guang
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT IV, 2024, 15019 : 325 - 339
[49] Actor-critic learning based PID control for robotic manipulators
Nohooji, Hamed Rahimi
Zaraki, Abolfazl
Voos, Holger
APPLIED SOFT COMPUTING, 2024, 151
[50] A Sample-Efficient Actor-Critic Algorithm for Recommendation Diversification
LI Shuang
YAN Yanghui
REN Ju
ZHOU Yuezhi
ZHANG Yaoxue
ChineseJournalofElectronics, 2020, 29 (01) : 89 - 96

← 1 2 3 4 5 →