Pseudo-rehearsal: Achieving deep reinforcement learning without catastrophic forgetting

被引:41
作者
Atkinson, Craig [1 ]
McCane, Brendan [1 ]
Szymanski, Lech [1 ]
Robins, Anthony [1 ]
机构
[1] Univ Otago, Dept Comp Sci, 133 Union St East, Dunedin, New Zealand
关键词
Deep reinforcement learning; Pseudo-rehearsal; Catastrophic forgetting; Generative adversarial network;
D O I
10.1016/j.neucom.2020.11.050
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Neural networks can achieve excellent results in a wide variety of applications. However, when they attempt to sequentially learn, they tend to learn the new task while catastrophically forgetting previous ones. We propose a model that overcomes catastrophic forgetting in sequential reinforcement learning by combining ideas from continual learning in both the image classification domain and the reinforcement learning domain. This model features a dual memory system which separates continual learning from reinforcement learning and a pseudo-rehearsal system that "recalls" items representative of previous tasks via a deep generative network. Our model sequentially learns Atari 2600 games without demonstrating catastrophic forgetting and continues to perform above human level on all three games. This result is achieved without: demanding additional storage requirements as the number of tasks increases, storing raw data or revisiting past tasks. In comparison, previous state-of-the-art solutions are substantially more vulnerable to forgetting on these complex deep reinforcement learning tasks. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页码:291 / 307
页数:17
相关论文
共 41 条
[1]  
Abraham W.C., NPJ SCI LEARNING, V4
[2]   Memory retention - the synaptic stability versus plasticity dilemma [J].
Abraham, WC ;
Robins, A .
TRENDS IN NEUROSCIENCES, 2005, 28 (02) :73-78
[3]  
[Anonymous], 2014, INT C LEARN REPR
[4]  
[Anonymous], ARXIV160604671
[5]  
Atkinson C., ARXIV180203875
[6]   Reinforcement learning in continuous time and space: Interference and not ill conditioning is the main problem when using distributed function approximators [J].
Baddeley, Bart .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (04) :950-956
[7]  
Berseth Glen, 2018, INT C LEARN REPR
[8]  
Caselles-Dupre H., ARXIV190209434
[9]  
Caselles-Dupre H., ARXIV181003880
[10]   Sleep transforms the cerebral trace of declarative memories [J].
Gais, Steffen ;
Albouy, Genevieve ;
Boly, Melanie ;
Dang-Vu, Thien Thanh ;
Darsaud, Annabelle ;
Desseilles, Martin ;
Rauchs, Geraldine ;
Schabus, Manuel ;
Sterpenich, Virginie ;
Vandewalle, Gilles ;
Maquet, Pierre ;
Peigneux, Philippe .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (47) :18778-18783