Pseudo-rehearsal: Achieving deep reinforcement learning without catastrophic forgetting

被引：51

作者：

Atkinson, Craig ^{[1
]}

McCane, Brendan ^{[1
]}

Szymanski, Lech ^{[1
]}

Robins, Anthony ^{[1
]}

机构：

[1] Univ Otago, Dept Comp Sci, 133 Union St East, Dunedin, New Zealand

来源：

NEUROCOMPUTING | 2021年 / 428卷

关键词：

Deep reinforcement learning; Pseudo-rehearsal; Catastrophic forgetting; Generative adversarial network;

D O I：

10.1016/j.neucom.2020.11.050

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Neural networks can achieve excellent results in a wide variety of applications. However, when they attempt to sequentially learn, they tend to learn the new task while catastrophically forgetting previous ones. We propose a model that overcomes catastrophic forgetting in sequential reinforcement learning by combining ideas from continual learning in both the image classification domain and the reinforcement learning domain. This model features a dual memory system which separates continual learning from reinforcement learning and a pseudo-rehearsal system that "recalls" items representative of previous tasks via a deep generative network. Our model sequentially learns Atari 2600 games without demonstrating catastrophic forgetting and continues to perform above human level on all three games. This result is achieved without: demanding additional storage requirements as the number of tasks increases, storing raw data or revisiting past tasks. In comparison, previous state-of-the-art solutions are substantially more vulnerable to forgetting on these complex deep reinforcement learning tasks. (C) 2020 Elsevier B.V. All rights reserved.

引用

页码：291 / 307

页数：17

共 41 条

[1]

Abraham W.C., NPJ SCI LEARNING, V4

[2] Memory retention - the synaptic stability versus plasticity dilemma [J].

Abraham, WC ;

Robins, A .

TRENDS IN NEUROSCIENCES, 2005, 28 (02) :73-78

[3]

[Anonymous], 2014, INT C LEARN REPR

[4]

[Anonymous], ARXIV160604671

[5]

Atkinson C., ARXIV180203875

[6] Reinforcement learning in continuous time and space: Interference and not ill conditioning is the main problem when using distributed function approximators [J].

Baddeley, Bart .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (04) :950-956

[7]

Berseth Glen, 2018, INT C LEARN REPR

[8]

Caselles-Dupre H., ARXIV190209434

[9]

Caselles-Dupre H., ARXIV181003880

[10] Sleep transforms the cerebral trace of declarative memories [J].

Gais, Steffen ;

Albouy, Genevieve ;

Boly, Melanie ;

Dang-Vu, Thien Thanh ;

Darsaud, Annabelle ;

Desseilles, Martin ;

Rauchs, Geraldine ;

Schabus, Manuel ;

Sterpenich, Virginie ;

Vandewalle, Gilles ;

Maquet, Pierre ;

Peigneux, Philippe .

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (47) :18778-18783

← 1 2 3 4 5 →