Mixing Update Q-value for Deep Reinforcement Learning

被引：4

作者：

Li, Zhunan ^{[1
,2
]}

Hou, Xinwen ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Automat, Ctr Res Intelligent Syst & Engn, Beijing, Peoples R China

[2] Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China

来源：

2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2019年

基金：

国家重点研发计划;

关键词：

D O I：

10.1109/ijcnn.2019.8852397

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The value-based reinforcement learning methods are known to overestimate action values such as deep Q-learning, which could lead to suboptimal policies. This problem also persists in an actor-critic algorithm. In this paper, we propose a novel mechanism to minimize its effects on both the critic and the actor. Our mechanism builds on Double Q-learning, by mixing update action value based on the minimum and maximum between a pair of critics to limit the overestimation. We then propose a specific adaptation to the Twin Delayed Deep Deterministic policy gradient algorithm (TD3) and show that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several tasks.

引用

页数：6

共 15 条

[1]

[Anonymous], 2018, REINFORCEMENT LEARNI

[2]

[Anonymous], 1993, P 1993 CONN MOD SUMM

[3]

Cheung V, 2016, OPENAI GYM

[4]

Cornish Christopher John, 1989, (Ph.D. thesis

[5]

Fujimoto S, 2018, PR MACH LEARN RES, V80

[6]

Kingma DP, 2014, ADV NEUR IN, V27

[7]

Lillicrap T. P., 2016, CoRR, abs/1509.02971, P1

[8] SELF-IMPROVING REACTIVE AGENTS BASED ON REINFORCEMENT LEARNING, PLANNING AND TEACHING [J].

LIN, LJ .

MACHINE LEARNING, 1992, 8 (3-4) :293-321

[9]

Mnih V., 2013, Asynchronous methods for deep reinforcement learning, V1312, P5602

[10] Human-level control through deep reinforcement learning [J].

Mnih, Volodymyr ;

Kavukcuoglu, Koray ;

Silver, David ;

Rusu, Andrei A. ;

Veness, Joel ;

Bellemare, Marc G. ;

Graves, Alex ;

Riedmiller, Martin ;

Fidjeland, Andreas K. ;

Ostrovski, Georg ;

Petersen, Stig ;

Beattie, Charles ;

Sadik, Amir ;

Antonoglou, Ioannis ;

King, Helen ;

Kumaran, Dharshan ;

Wierstra, Daan ;

Legg, Shane ;

Hassabis, Demis .

NATURE, 2015, 518 (7540) :529-533

← 1 2 →