Conservative network for offline reinforcement learning

被引:1
|
作者
Peng, Zhiyong [1 ]
Liu, Yadong [1 ]
Chen, Haoqiang [1 ]
Zhou, Zongtan [1 ]
机构
[1] Natl Univ Def Technol, Coll Intelligence Sci & Technol, Changsha 410000, Hunan, Peoples R China
关键词
Reinforcement learning; Offline reinforcement learning; OOD prediction; Activation functions; Ensemble methods; GO; LEVEL; GAME;
D O I
10.1016/j.knosys.2023.111101
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Offline reinforcement learning (RL) aims to learn policies from static datasets. The value overestimation of out-of-distribution (OOD) actions makes it difficult to directly apply general RL methods in the offline setting. To overcome this problem, many works focus on estimating the value function conservatively or pessimistically. However, existing methods require additional OOD sampling or uncertainty estimation to underestimate OOD values, making them complex and vulnerable to hyperparameters. Is it possible to design a specific value function that can automatically be conservative on OOD samples? In this study, we reveal the anti-conservation property of the widely used ReLU network under certain conditions and explain the reason theoretically. Based on the analysis of the ReLU network, we propose a novel neural network architecture that pushes down the value of those samples far away from the datasets; we call this kind of new architecture the Conservative Network (ConsNet). Based on ConsNet, a new offline RL algorithm with simple implementation and high performance is proposed. Since we can obtain additional conservation from the ConsNet itself, by integrating the ConsNet into several existing offline RL methods, we find that it can significantly improve the performance or reduce the original algorithm complexity. With its simplicity and superiority, we hope that ConsNet could be a new fundamental network architecture for offline RL.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Conservative Offline Distributional Reinforcement Learning
    Ma, Yecheng Jason
    Jayaraman, Dinesh
    Bastani, Osbert
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [2] Conservative Offline Distributional Reinforcement Learning
    Ma, Yecheng Jason
    Jayaraman, Dinesh
    Bastani, Osbert
    Advances in Neural Information Processing Systems, 2021, 23 : 19235 - 19247
  • [3] Offline Quantum Reinforcement Learning in a Conservative Manner
    Cheng, Zhihao
    Zhang, Kaining
    Shen, Li
    Tao, Dacheng
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 7148 - 7156
  • [4] Adaptable Conservative Q-Learning for Offline Reinforcement Learning
    Qiu, Lyn
    Li, Xu
    Liang, Lenghan
    Sun, Mingming
    Yan, Junchi
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT III, 2024, 14427 : 200 - 212
  • [5] Mildly Conservative Q-Learning for Offline Reinforcement Learning
    Lyu, Jiafei
    Ma, Xiaoteng
    Li, Xiu
    Lu, Zongqing
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [6] Conservative State Value Estimation for Offline Reinforcement Learning
    Chen, Liting
    Yan, Jie
    Shao, Zhengdao
    Wang, Lu
    Lin, Qingwei
    Rajmohan, Saravan
    Moscibroda, Thomas
    Zhang, Dongmei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [7] RORL: Robust Offline Reinforcement Learning via Conservative Smoothing
    Yang, Rui
    Bai, Chenjia
    Ma, Xiaoteng
    Wang, Zhaoran
    Zhang, Chongjie
    Han, Lei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [8] Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning
    Shao, Jianzhun
    Qu, Yun
    Chen, Chen
    Zhang, Hongchang
    Ji, Xiangyang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [9] OCEAN-MBRL: Offline Conservative Exploration for Model-Based Offline Reinforcement Learning
    Wu, Fan
    Zhang, Rui
    Yi, Qi
    Gao, Yunkai
    Guo, Jiaming
    Peng, Shaohui
    Lan, Siming
    Han, Husheng
    Pan, Yansong
    Yuan, Kaizhao
    Jin, Pengwei
    Chen, Ruizhi
    Chen, Yunji
    Li, Ling
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 15897 - 15905
  • [10] VOCE: Variational Optimization with Conservative Estimation for Offline Safe Reinforcement Learning
    Guan, Jiayi
    Chen, Guang
    Ji, Jiaming
    Yang, Long
    Zhou, Ao
    Li, Zhijun
    Jiang, Changjun
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,