Mixing Update Q-value for Deep Reinforcement Learning

被引:4
作者
Li, Zhunan [1 ,2 ]
Hou, Xinwen [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, Ctr Res Intelligent Syst & Engn, Beijing, Peoples R China
[2] Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
来源
2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2019年
基金
国家重点研发计划;
关键词
D O I
10.1109/ijcnn.2019.8852397
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The value-based reinforcement learning methods are known to overestimate action values such as deep Q-learning, which could lead to suboptimal policies. This problem also persists in an actor-critic algorithm. In this paper, we propose a novel mechanism to minimize its effects on both the critic and the actor. Our mechanism builds on Double Q-learning, by mixing update action value based on the minimum and maximum between a pair of critics to limit the overestimation. We then propose a specific adaptation to the Twin Delayed Deep Deterministic policy gradient algorithm (TD3) and show that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several tasks.
引用
收藏
页数:6
相关论文
共 15 条
[1]  
[Anonymous], 2018, REINFORCEMENT LEARNI
[2]  
[Anonymous], 1993, P 1993 CONN MOD SUMM
[3]  
Cheung V, 2016, OPENAI GYM
[4]  
Cornish Christopher John, 1989, (Ph.D. thesis
[5]  
Fujimoto S, 2018, PR MACH LEARN RES, V80
[6]  
Kingma DP, 2014, ADV NEUR IN, V27
[7]  
Lillicrap T. P., 2016, CoRR, abs/1509.02971, P1
[9]  
Mnih V., 2013, Asynchronous methods for deep reinforcement learning, V1312, P5602
[10]   Human-level control through deep reinforcement learning [J].
Mnih, Volodymyr ;
Kavukcuoglu, Koray ;
Silver, David ;
Rusu, Andrei A. ;
Veness, Joel ;
Bellemare, Marc G. ;
Graves, Alex ;
Riedmiller, Martin ;
Fidjeland, Andreas K. ;
Ostrovski, Georg ;
Petersen, Stig ;
Beattie, Charles ;
Sadik, Amir ;
Antonoglou, Ioannis ;
King, Helen ;
Kumaran, Dharshan ;
Wierstra, Daan ;
Legg, Shane ;
Hassabis, Demis .
NATURE, 2015, 518 (7540) :529-533