Prediction and Control in Continual Reinforcement Learning

被引：0

作者：

Anand, Nishanth ^{[1
,2
]}

Precup, Doina ^{[1
,3
]}

机构：

[1] McGill Univ, Sch Comp Sci, Montreal, PQ, Canada

[2] Mila, Milan, Italy

[3] Deepmind, London, England

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

基金：

加拿大自然科学与工程研究理事会;

关键词：

GAME; GO;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Temporal difference (TD) learning is often used to update the estimate of the value function which is used by RL agents to extract useful policies. In this paper, we focus on value function estimation in continual reinforcement learning. We propose to decompose the value function into two components which update at different timescales: a permanent value function, which holds general knowledge that persists over time, and a transient value function, which allows quick adaptation to new situations. We establish theoretical results showing that our approach is well suited for continual learning and draw connections to the complementary learning systems (CLS) theory from neuroscience. Empirically, this approach improves performance significantly on both prediction and control problems.

引用

页数：39

共 50 条

[1] SLER: Self-generated long-term experience replay for continual reinforcement learning
Li, Chunmao
Li, Yang
Zhao, Yinliang
Peng, Peng
Geng, Xupeng
APPLIED INTELLIGENCE, 2021, 51 (01) : 185 - 201
[2] Reinforcement Learning for Control with Multiple Frequencies
Lee, Jongmin
Lee, Byung-Jun
Kim, Kee-Eung
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[3] Deep reinforcement learning control of hydraulic fracturing
Bangi, Mohammed Saad Faizan
Kwon, Joseph Sang-Il
COMPUTERS & CHEMICAL ENGINEERING, 2021, 154
[4] Neural Malware Control with Deep Reinforcement Learning
Wang, Yu
Stokes, Jack W.
Marinescu, Mady
MILCOM 2019 - 2019 IEEE MILITARY COMMUNICATIONS CONFERENCE (MILCOM), 2019,
[5] Experience Selection in Deep Reinforcement Learning for Control
de Bruin, Tim
Kober, Jens
Tuyls, Karl
Babuska, Robert
JOURNAL OF MACHINE LEARNING RESEARCH, 2018, 19
[6] Satellite Attitude Control with Deep Reinforcement Learning
Gao, Duozhi
Zhang, Haibo
Li, Chuanjiang
Gao, Xinzhou
2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 4095 - 4101
[7] Deep reinforcement learning for quantum gate control
An, Zheng
Zhou, D. L.
EPL, 2019, 126 (06)
[8] Integrating Classical Control into Reinforcement Learning Policy
Huang, Ye
Gu, Chaochen
Guan, Xinping
NEURAL PROCESSING LETTERS, 2021, 53 (03) : 1709 - 1722
[9] Autonomous Vehicles Roundup Strategy by Reinforcement Learning with Prediction Trajectory
Ni, Jiayang
Ma, Rubing
Zhong, Hua
Wang, Bo
2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 3370 - 3375
[10] A Deep Reinforcement Learning Perspective on Internet Congestion Control
Jay, Nathan
Rotman, Noga H.
Godfrey, P. Brighten
Schapira, Michael
Tamar, Aviv
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97

← 1 2 3 4 5 →