TD-regularized actor-critic methods

被引:1
作者
Simone Parisi
Voot Tangkaratt
Jan Peters
Mohammad Emtiyaz Khan
机构
[1] Technische Unviersität Darmstadt,
[2] RIKEN Center for Advanced Intelligence Project,undefined
[3] Max-Planck-Institut für Intelligente Systeme,undefined
来源
Machine Learning | 2019年 / 108卷
关键词
Reinforcement learning; Actor-critic; Temporal difference;
D O I
暂无
中图分类号
学科分类号
摘要
Actor-critic methods can achieve incredible performance on difficult reinforcement learning problems, but they are also prone to instability. This is partly due to the interaction between the actor and critic during learning, e.g., an inaccurate step taken by one of them might adversely affect the other and destabilize the learning. To avoid such issues, we propose to regularize the learning objective of the actor by penalizing the temporal difference (TD) error of the critic. This improves stability by avoiding large steps in the actor update whenever the critic is highly inaccurate. The resulting method, which we call the TD-regularized actor-critic method, is a simple plug-and-play approach to improve stability and overall performance of the actor-critic methods. Evaluations on standard benchmarks confirm this. Source code can be found at https://github.com/sparisi/td-reg.
引用
收藏
页码:1467 / 1501
页数:34
相关论文
共 50 条
  • [1] TD-regularized actor-critic methods
    Parisi, Simone
    Tangkaratt, Voot
    Peters, Jan
    Khan, Mohammad Emtiyaz
    MACHINE LEARNING, 2019, 108 (8-9) : 1467 - 1501
  • [2] Error controlled actor-critic
    Gao, Xingen
    Chao, Fei
    Zhou, Changle
    Ge, Zhen
    Yang, Longzhi
    Chang, Xiang
    Shang, Changjing
    Shen, Qiang
    INFORMATION SCIENCES, 2022, 612 : 62 - 74
  • [3] Master-Slave Policy Collaboration for Actor-Critic Methods
    Li, Xiaomu
    Liu, Quan
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [4] Natural Actor-Critic
    Peters, Jan
    Schaal, Stefan
    NEUROCOMPUTING, 2008, 71 (7-9) : 1180 - 1190
  • [5] On actor-critic algorithms
    Konda, VR
    Tsitsiklis, JN
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2003, 42 (04) : 1143 - 1166
  • [6] Efficient Model Learning Methods for Actor-Critic Control
    Grondman, Ivo
    Vaandrager, Maarten
    Busoniu, Lucian
    Babuska, Robert
    Schuitema, Erik
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2012, 42 (03): : 591 - 602
  • [7] Variational actor-critic algorithms*,**
    Zhu, Yuhua
    Ying, Lexing
    ESAIM-CONTROL OPTIMISATION AND CALCULUS OF VARIATIONS, 2023, 29
  • [8] Multi-actor mechanism for actor-critic reinforcement learning
    Li, Lin
    Li, Yuze
    Wei, Wei
    Zhang, Yujia
    Liang, Jiye
    INFORMATION SCIENCES, 2023, 647
  • [9] The Effect of Discounting Actor-loss in Actor-Critic Algorithm
    Yaputra, Jordi
    Suyanto, Suyanto
    2021 4TH INTERNATIONAL SEMINAR ON RESEARCH OF INFORMATION TECHNOLOGY AND INTELLIGENT SYSTEMS (ISRITI 2021), 2020,
  • [10] Noisy Importance Sampling Actor-Critic: An Off-Policy Actor-Critic With Experience Replay
    Tasfi, Norman
    Capretz, Miriam
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,