Conservative network for offline reinforcement learning

被引：1

作者：

Peng, Zhiyong ^{[1
]}

Liu, Yadong ^{[1
]}

Chen, Haoqiang ^{[1
]}

Zhou, Zongtan ^{[1
]}

机构：

[1] Natl Univ Def Technol, Coll Intelligence Sci & Technol, Changsha 410000, Hunan, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2023年 / 282卷

关键词：

Reinforcement learning; Offline reinforcement learning; OOD prediction; Activation functions; Ensemble methods; GO; LEVEL; GAME;

D O I：

10.1016/j.knosys.2023.111101

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Offline reinforcement learning (RL) aims to learn policies from static datasets. The value overestimation of out-of-distribution (OOD) actions makes it difficult to directly apply general RL methods in the offline setting. To overcome this problem, many works focus on estimating the value function conservatively or pessimistically. However, existing methods require additional OOD sampling or uncertainty estimation to underestimate OOD values, making them complex and vulnerable to hyperparameters. Is it possible to design a specific value function that can automatically be conservative on OOD samples? In this study, we reveal the anti-conservation property of the widely used ReLU network under certain conditions and explain the reason theoretically. Based on the analysis of the ReLU network, we propose a novel neural network architecture that pushes down the value of those samples far away from the datasets; we call this kind of new architecture the Conservative Network (ConsNet). Based on ConsNet, a new offline RL algorithm with simple implementation and high performance is proposed. Since we can obtain additional conservation from the ConsNet itself, by integrating the ConsNet into several existing offline RL methods, we find that it can significantly improve the performance or reduce the original algorithm complexity. With its simplicity and superiority, we hope that ConsNet could be a new fundamental network architecture for offline RL.

引用

页数：10

共 50 条

[1] Conservative Offline Distributional Reinforcement Learning
Ma, Yecheng Jason
Jayaraman, Dinesh
Bastani, Osbert
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[2] Conservative Offline Distributional Reinforcement Learning
Ma, Yecheng Jason
Jayaraman, Dinesh
Bastani, Osbert
Advances in Neural Information Processing Systems, 2021, 23 : 19235 - 19247
[3] Offline Quantum Reinforcement Learning in a Conservative Manner
Cheng, Zhihao
Zhang, Kaining
Shen, Li
Tao, Dacheng
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 7148 - 7156
[4] Adaptable Conservative Q-Learning for Offline Reinforcement Learning
Qiu, Lyn
Li, Xu
Liang, Lenghan
Sun, Mingming
Yan, Junchi
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT III, 2024, 14427 : 200 - 212
[5] Mildly Conservative Q-Learning for Offline Reinforcement Learning
Lyu, Jiafei
Ma, Xiaoteng
Li, Xiu
Lu, Zongqing
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[6] Conservative State Value Estimation for Offline Reinforcement Learning
Chen, Liting
Yan, Jie
Shao, Zhengdao
Wang, Lu
Lin, Qingwei
Rajmohan, Saravan
Moscibroda, Thomas
Zhang, Dongmei
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[7] RORL: Robust Offline Reinforcement Learning via Conservative Smoothing
Yang, Rui
Bai, Chenjia
Ma, Xiaoteng
Wang, Zhaoran
Zhang, Chongjie
Han, Lei
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[8] Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning
Shao, Jianzhun
Qu, Yun
Chen, Chen
Zhang, Hongchang
Ji, Xiangyang
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[9] OCEAN-MBRL: Offline Conservative Exploration for Model-Based Offline Reinforcement Learning
Wu, Fan
Zhang, Rui
Yi, Qi
Gao, Yunkai
Guo, Jiaming
Peng, Shaohui
Lan, Siming
Han, Husheng
Pan, Yansong
Yuan, Kaizhao
Jin, Pengwei
Chen, Ruizhi
Chen, Yunji
Li, Ling
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 15897 - 15905
[10] VOCE: Variational Optimization with Conservative Estimation for Offline Safe Reinforcement Learning
Guan, Jiayi
Chen, Guang
Ji, Jiaming
Yang, Long
Zhou, Ao
Li, Zhijun
Jiang, Changjun
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,

← 1 2 3 4 5 →