TD-regularized actor-critic methods

被引：1

作者：

Simone Parisi

Voot Tangkaratt

Jan Peters

Mohammad Emtiyaz Khan

机构：

[1] Technische Unviersität Darmstadt,

[2] RIKEN Center for Advanced Intelligence Project,undefined

[3] Max-Planck-Institut für Intelligente Systeme,undefined

来源：

Machine Learning | 2019年 / 108卷

关键词：

Reinforcement learning; Actor-critic; Temporal difference;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Actor-critic methods can achieve incredible performance on difficult reinforcement learning problems, but they are also prone to instability. This is partly due to the interaction between the actor and critic during learning, e.g., an inaccurate step taken by one of them might adversely affect the other and destabilize the learning. To avoid such issues, we propose to regularize the learning objective of the actor by penalizing the temporal difference (TD) error of the critic. This improves stability by avoiding large steps in the actor update whenever the critic is highly inaccurate. The resulting method, which we call the TD-regularized actor-critic method, is a simple plug-and-play approach to improve stability and overall performance of the actor-critic methods. Evaluations on standard benchmarks confirm this. Source code can be found at https://github.com/sparisi/td-reg.

引用

页码：1467 / 1501

页数：34

共 50 条

[31] Optimal Actor-Critic Policy With Optimized Training Datasets
Banerjee, Chayan
Chen, Zhiyong
Noman, Nasimul
Zamani, Mohsen
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2022, 6 (06): : 1324 - 1334
[32] Efficient data use in incremental actor-critic algorithms
Cheng, Yuhu
Feng, Huanting
Wang, Xuesong
NEUROCOMPUTING, 2013, 116 : 346 - 354
[33] Boosting On-Policy Actor-Critic With Shallow Updates in Critic
Li, Luntong
Zhu, Yuanheng
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 10
[34] Advantage Actor-Critic for Autonomous Intersection Management
Ayeelyan, John
Lee, Guan-Hung
Hsu, Hsiu-Chun
Hsiung, Pao-Ann
VEHICLES, 2022, 4 (04): : 1391 - 1412
[35] Sustainable l2-Regularized Actor-Critic based on Recursive Least-Squares Temporal Difference Learning
Li, Luntong
Li, Dazi
Song, Tianheng
2017 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2017, : 1886 - 1891
[36] Forward Actor-Critic for Nonlinear Function Approximation in Reinforcement Learning
Veeriah, Vivek
van Seijen, Harm
Sutton, Richard S.
AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 556 - 564
[37] A World Model for Actor-Critic in Reinforcement Learning
Panov, A. I.
Ugadiarov, L. A.
PATTERN RECOGNITION AND IMAGE ANALYSIS, 2023, 33 (03) : 467 - 477
[38] Actor-Critic Algorithm with Transition Cost Estimation
Sergey, Denisov
Lee, Jee-Hyong
INTERNATIONAL JOURNAL OF FUZZY LOGIC AND INTELLIGENT SYSTEMS, 2016, 16 (04) : 270 - 275
[39] Diffusion welding furnace temperature controller based on Actor-Critic
Li, Qiang
Li, Gang
Wang, Xing
Wei, Min
PROCEEDINGS OF THE 38TH CHINESE CONTROL CONFERENCE (CCC), 2019, : 2484 - 2487
[40] Finite-Time Analysis of Natural Actor-Critic for POMDPs
Cayci, Semih
He, Niao
Srikant, R.
SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2024, 6 (04): : 869 - 896

← 1 2 3 4 5 →