TD-regularized actor-critic methods

被引:1
作者
Simone Parisi
Voot Tangkaratt
Jan Peters
Mohammad Emtiyaz Khan
机构
[1] Technische Unviersität Darmstadt,
[2] RIKEN Center for Advanced Intelligence Project,undefined
[3] Max-Planck-Institut für Intelligente Systeme,undefined
来源
Machine Learning | 2019年 / 108卷
关键词
Reinforcement learning; Actor-critic; Temporal difference;
D O I
暂无
中图分类号
学科分类号
摘要
Actor-critic methods can achieve incredible performance on difficult reinforcement learning problems, but they are also prone to instability. This is partly due to the interaction between the actor and critic during learning, e.g., an inaccurate step taken by one of them might adversely affect the other and destabilize the learning. To avoid such issues, we propose to regularize the learning objective of the actor by penalizing the temporal difference (TD) error of the critic. This improves stability by avoiding large steps in the actor update whenever the critic is highly inaccurate. The resulting method, which we call the TD-regularized actor-critic method, is a simple plug-and-play approach to improve stability and overall performance of the actor-critic methods. Evaluations on standard benchmarks confirm this. Source code can be found at https://github.com/sparisi/td-reg.
引用
收藏
页码:1467 / 1501
页数:34
相关论文
共 50 条
  • [21] Robust Actor-Critic With Relative Entropy Regulating Actor
    Cheng, Yuhu
    Huang, Longyang
    Chen, C. L. Philip
    Wang, Xuesong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (11) : 9054 - 9063
  • [22] Off-Policy Actor-Critic with Emphatic Weightings
    Graves, Eric
    Imani, Ehsan
    Kumaraswamy, Raksha
    White, Martha
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [23] Improving sample efficiency in Multi-Agent Actor-Critic methods
    Ye, Zhenhui
    Chen, Yining
    Jiang, Xiaohong
    Song, Guanghua
    Yang, Bowei
    Fan, Sheng
    APPLIED INTELLIGENCE, 2022, 52 (04) : 3691 - 3704
  • [24] An Actor-Critic Method for Simulation-Based Optimization
    Li, Kuo
    Jia, Qing-Shan
    Yan, Jiaqi
    IFAC PAPERSONLINE, 2022, 55 (11): : 7 - 12
  • [25] ACRE: Actor-Critic with Reward-Preserving Exploration
    Athanasios Ch. Kapoutsis
    Dimitrios I. Koutras
    Christos D. Korkas
    Elias B. Kosmatopoulos
    Neural Computing and Applications, 2023, 35 : 22563 - 22576
  • [26] Looking Back on the Actor-Critic Architecture
    Barto, Andrew G.
    Sutton, Richard S.
    Anderson, Charles W.
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2021, 51 (01): : 40 - 50
  • [27] Improving sample efficiency in Multi-Agent Actor-Critic methods
    Zhenhui Ye
    Yining Chen
    Xiaohong Jiang
    Guanghua Song
    Bowei Yang
    Sheng Fan
    Applied Intelligence, 2022, 52 : 3691 - 3704
  • [28] A multi-agent reinforcement learning using Actor-Critic methods
    Li, Chun-Gui
    Wang, Meng
    Yuan, Qing-Neng
    PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 878 - 882
  • [29] An actor-critic model of saccade adaptation
    Manabu Inaba
    Tadashi Yamazaki
    BMC Neuroscience, 14 (Suppl 1)
  • [30] Genetic Network Programming with Actor-Critic
    Hatakeyama, Hiroyuki
    Mabu, Shingo
    Hirasawa, Kotaro
    Hu, Jinglu
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2007, 11 (01) : 79 - 86