TD-regularized actor-critic methods

被引:1
作者
Simone Parisi
Voot Tangkaratt
Jan Peters
Mohammad Emtiyaz Khan
机构
[1] Technische Unviersität Darmstadt,
[2] RIKEN Center for Advanced Intelligence Project,undefined
[3] Max-Planck-Institut für Intelligente Systeme,undefined
来源
Machine Learning | 2019年 / 108卷
关键词
Reinforcement learning; Actor-critic; Temporal difference;
D O I
暂无
中图分类号
学科分类号
摘要
Actor-critic methods can achieve incredible performance on difficult reinforcement learning problems, but they are also prone to instability. This is partly due to the interaction between the actor and critic during learning, e.g., an inaccurate step taken by one of them might adversely affect the other and destabilize the learning. To avoid such issues, we propose to regularize the learning objective of the actor by penalizing the temporal difference (TD) error of the critic. This improves stability by avoiding large steps in the actor update whenever the critic is highly inaccurate. The resulting method, which we call the TD-regularized actor-critic method, is a simple plug-and-play approach to improve stability and overall performance of the actor-critic methods. Evaluations on standard benchmarks confirm this. Source code can be found at https://github.com/sparisi/td-reg.
引用
收藏
页码:1467 / 1501
页数:34
相关论文
共 50 条
  • [31] Optimal Actor-Critic Policy With Optimized Training Datasets
    Banerjee, Chayan
    Chen, Zhiyong
    Noman, Nasimul
    Zamani, Mohsen
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2022, 6 (06): : 1324 - 1334
  • [32] Efficient data use in incremental actor-critic algorithms
    Cheng, Yuhu
    Feng, Huanting
    Wang, Xuesong
    NEUROCOMPUTING, 2013, 116 : 346 - 354
  • [33] Boosting On-Policy Actor-Critic With Shallow Updates in Critic
    Li, Luntong
    Zhu, Yuanheng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 10
  • [34] Advantage Actor-Critic for Autonomous Intersection Management
    Ayeelyan, John
    Lee, Guan-Hung
    Hsu, Hsiu-Chun
    Hsiung, Pao-Ann
    VEHICLES, 2022, 4 (04): : 1391 - 1412
  • [35] Sustainable l2-Regularized Actor-Critic based on Recursive Least-Squares Temporal Difference Learning
    Li, Luntong
    Li, Dazi
    Song, Tianheng
    2017 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2017, : 1886 - 1891
  • [36] Forward Actor-Critic for Nonlinear Function Approximation in Reinforcement Learning
    Veeriah, Vivek
    van Seijen, Harm
    Sutton, Richard S.
    AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 556 - 564
  • [37] A World Model for Actor-Critic in Reinforcement Learning
    Panov, A. I.
    Ugadiarov, L. A.
    PATTERN RECOGNITION AND IMAGE ANALYSIS, 2023, 33 (03) : 467 - 477
  • [38] Actor-Critic Algorithm with Transition Cost Estimation
    Sergey, Denisov
    Lee, Jee-Hyong
    INTERNATIONAL JOURNAL OF FUZZY LOGIC AND INTELLIGENT SYSTEMS, 2016, 16 (04) : 270 - 275
  • [39] Diffusion welding furnace temperature controller based on Actor-Critic
    Li, Qiang
    Li, Gang
    Wang, Xing
    Wei, Min
    PROCEEDINGS OF THE 38TH CHINESE CONTROL CONFERENCE (CCC), 2019, : 2484 - 2487
  • [40] Finite-Time Analysis of Natural Actor-Critic for POMDPs
    Cayci, Semih
    He, Niao
    Srikant, R.
    SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2024, 6 (04): : 869 - 896