Addressing Function Approximation Error in Actor-Critic Methods

被引:0
作者
Fujimoto, Scott [1 ]
van Hoof, Herke [2 ]
Meger, David [1 ]
机构
[1] McGill Univ, Montreal, PQ, Canada
[2] Univ Amsterdam, Amsterdam, Netherlands
来源
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80 | 2018年 / 80卷
关键词
BIAS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies. We show that this problem persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and the critic. Our algorithm builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation. We draw the connection between target networks and overestimation bias, and suggest delaying policy updates to reduce per-update error and further improve performance. We evaluate our method on the suite of OpenAI gym tasks, outperforming the state of the art in every environment tested.
引用
收藏
页数:10
相关论文
共 42 条
  • [1] [Anonymous], 2017, OpenAI Baselines
  • [2] [Anonymous], 2016, CORR
  • [3] [Anonymous], 2014, ICML ICML 14
  • [4] [Anonymous], 2017, CoRR
  • [5] [Anonymous], 1998, REINFORCEMENT LEARNI
  • [6] [Anonymous], ADV NEURAL INFORM PR
  • [7] Anschel O, 2017, 34 INT C MACHINE LEA, V70
  • [8] Barth-Maron G., 2018, P INT C LEARN REPR
  • [9] Bellemare MG, 2017, PR MACH LEARN RES, V70
  • [10] Bellman R. E., 1957, Dynamic programming. Princeton landmarks in mathematics