TD-regularized actor-critic methods

被引:1
作者
Simone Parisi
Voot Tangkaratt
Jan Peters
Mohammad Emtiyaz Khan
机构
[1] Technische Unviersität Darmstadt,
[2] RIKEN Center for Advanced Intelligence Project,undefined
[3] Max-Planck-Institut für Intelligente Systeme,undefined
来源
Machine Learning | 2019年 / 108卷
关键词
Reinforcement learning; Actor-critic; Temporal difference;
D O I
暂无
中图分类号
学科分类号
摘要
Actor-critic methods can achieve incredible performance on difficult reinforcement learning problems, but they are also prone to instability. This is partly due to the interaction between the actor and critic during learning, e.g., an inaccurate step taken by one of them might adversely affect the other and destabilize the learning. To avoid such issues, we propose to regularize the learning objective of the actor by penalizing the temporal difference (TD) error of the critic. This improves stability by avoiding large steps in the actor update whenever the critic is highly inaccurate. The resulting method, which we call the TD-regularized actor-critic method, is a simple plug-and-play approach to improve stability and overall performance of the actor-critic methods. Evaluations on standard benchmarks confirm this. Source code can be found at https://github.com/sparisi/td-reg.
引用
收藏
页码:1467 / 1501
页数:34
相关论文
共 50 条
  • [41] Real-Time 'Actor-Critic' Tracking
    Chen, Boyu
    Wang, Dong
    Li, Peixia
    Wang, Shuang
    Lu, Huchuan
    COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 328 - 345
  • [42] Bayesian Policy Gradient and Actor-Critic Algorithms
    Ghavamzadeh, Mohammad
    Engel, Yaakov
    Valko, Michal
    JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [43] A fuzzy Actor-Critic reinforcement learning network
    Wang, Xue-Song
    Cheng, Yu-Hu
    Yi, Jian-Qiang
    INFORMATION SCIENCES, 2007, 177 (18) : 3764 - 3781
  • [44] Actor-critic with familiarity-based trajectory experience replay
    Gong, Xiaoyu
    Yu, Jiayu
    Lu, Shuai
    Lu, Hengwei
    INFORMATION SCIENCES, 2022, 582 : 633 - 647
  • [45] A Soft Actor-Critic Algorithm for Sequential Recommendation
    Hong, Hyejin
    Kimurn, Yusuke
    Hatano, Kenji
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PT I, DEXA 2024, 2024, 14910 : 258 - 266
  • [46] Classical Actor-Critic Applied to the Control of a Self - Regulatory Process
    Bras, E. H.
    Louw, T. M.
    Bradshaw, S. M.
    IFAC PAPERSONLINE, 2023, 56 (02): : 7172 - 7177
  • [47] A Sample-Efficient Actor-Critic Algorithm for Recommendation Diversification
    Li, Shuang
    Yan, Yanghui
    Ren, Ju
    Zhou, Yuezhi
    Zhang, Yaoxue
    CHINESE JOURNAL OF ELECTRONICS, 2020, 29 (01) : 89 - 96
  • [48] Asymmetric Actor-Critic for Adapting to Changing Environments in Reinforcement Learning
    Yue, Wangyang
    Zhou, Yuan
    Zhang, Xiaochuan
    Hua, Yuchen
    Li, Minne
    Fan, Zunlin
    Wang, Zhiyuan
    Kou, Guang
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT IV, 2024, 15019 : 325 - 339
  • [49] Actor-critic learning based PID control for robotic manipulators
    Nohooji, Hamed Rahimi
    Zaraki, Abolfazl
    Voos, Holger
    APPLIED SOFT COMPUTING, 2024, 151
  • [50] A Sample-Efficient Actor-Critic Algorithm for Recommendation Diversification
    LI Shuang
    YAN Yanghui
    REN Ju
    ZHOU Yuezhi
    ZHANG Yaoxue
    ChineseJournalofElectronics, 2020, 29 (01) : 89 - 96