Self-Paced Prioritized Curriculum Learning With Coverage Penalty in Deep Reinforcement Learning

被引：92

作者：

Ren, Zhipeng ^{[1
]}

Dong, Daoyi ^{[2
]}

Li, Huaxiong ^{[1
]}

Chen, Chunlin ^{[1
]}

机构：

[1] Nanjing Univ, Dept Control & Syst Engn, Sch Management & Engn, Nanjing 210093, Jiangsu, Peoples R China

[2] Univ New South Wales, Sch Engn & Informat Technol, Canberra, ACT 0200, Australia

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2018年 / 29卷 / 06期

基金：

中国国家自然科学基金; 澳大利亚研究理事会;

关键词：

Coverage penalty; curriculum learning; deep reinforcement learning; self-paced priority; ALGORITHM;

D O I：

10.1109/TNNLS.2018.2790981

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, a new training paradigm is proposed for deep reinforcement learning using self-paced prioritized curriculum learning with coverage penalty. The proposed deep curriculum reinforcement learning (DCRL) takes the most advantage of experience replay by adaptively selecting appropriate transitions from replay memory based on the complexity of each transition. The criteria of complexity in DCRL consist of self-paced priority as well as coverage penalty. The self-paced priority reflects the relationship between the temporal-difference error and the difficulty of the current curriculum for sample efficiency. The coverage penalty is taken into account for sample diversity. With comparison to deep Q network (DQN) and prioritized experience replay (PER) methods, the DCRL algorithm is evaluated on Atari 2600 games, and the experimental results show that DCRL outperforms DQN and PER on most of these games. More results further show that the proposed curriculum training paradigm of DCRL is also applicable and effective for other memory-based deep reinforcement learning approaches, such as double DQN and dueling network. All the experimental results demonstrate that DCRL can achieve improved training efficiency and robustness for deep reinforcement learning.

引用

页码：2216 / 2226

页数：11

共 49 条

[1]

Abbeel P., 2005, ACM INT C P SERIES, P1, DOI DOI 10.1145/1102351.1102352

[2]

Andre D, 1998, ADV NEUR IN, V10, P1001

[3]

[Anonymous], 2015, ASK ME ANYTHING DYNA

[4]

[Anonymous], COVERAGE EMBEDDING M

[5]

[Anonymous], 2016, INCORPORATING STRUCT

[6]

[Anonymous], 2016, 4 INT C LEARN REPR

[7]

[Anonymous], 2015, COMPUTER SCI

[8]

[Anonymous], P INT C MACH LEARN

[9]

[Anonymous], 1995, MACHINE LEARNING P, DOI DOI 10.1016/B978-1-55860-377-6.50013-X

[10]

[Anonymous], P ICDL AUG

← 1 2 3 4 5 →