Hyperparameter Tuning in Offline Reinforcement Learning

被引:0
|
作者
Tittaferrante, Andrew [1 ]
Yassine, Abdulsalam [2 ]
机构
[1] Lakehead Univ, Elect & Comp Engn, Thunder Bay, ON, Canada
[2] Lakehead Univ, Software Engn, Thunder Bay, ON, Canada
关键词
Deep Learning; Reinforcement Learning; Offline Reinforcement Learning;
D O I
10.1109/ICMLA55696.2022.00101
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we propose a reliable hyper-parameter tuning scheme for offline reinforcement learning. We demonstrate our proposed scheme using the simplest antmaze environment from the standard benchmark offline dataset, D4RL. The usual approach for policy evaluation in offline reinforcement learning involves online evaluation, i.e., cherry-picking best performance on the test environment. To mitigate this cherry-picking, we propose an ad-hoc online evaluation metric, which we name "median-median-return". This metric enables more reliable reporting of results because it represents the expected performance of the learned policy by taking the median online evaluation performance across both epochs and training runs. To demonstrate our scheme, we employ the recently state-of-the-art algorithm, IQL, and perform a thorough hyperparameter search based on our proposed metric. The tuned architectures enjoy notably stronger cherry-picked performance, and the best models are able to surpass the reported state-of-the-art performance on average.
引用
收藏
页码:585 / 590
页数:6
相关论文
共 50 条
  • [21] Offline Reinforcement Learning with Differential Privacy
    Qiao, Dan
    Wang, Yu-Xiang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [22] Bootstrapped Transformer for Offline Reinforcement Learning
    Wang, Kerong
    Zhao, Hanye
    Luo, Xufang
    Ren, Kan
    Zhang, Weinan
    Li, Dongsheng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [23] Offline reinforcement learning with task hierarchies
    Schwab, Devin
    Ray, Soumya
    MACHINE LEARNING, 2017, 106 (9-10) : 1569 - 1598
  • [24] Conservative network for offline reinforcement learning
    Peng, Zhiyong
    Liu, Yadong
    Chen, Haoqiang
    Zhou, Zongtan
    KNOWLEDGE-BASED SYSTEMS, 2023, 282
  • [25] Conservative Offline Distributional Reinforcement Learning
    Ma, Yecheng Jason
    Jayaraman, Dinesh
    Bastani, Osbert
    Advances in Neural Information Processing Systems, 2021, 23 : 19235 - 19247
  • [26] Offline reinforcement learning with representations for actions
    Lou, Xingzhou
    Yin, Qiyue
    Zhang, Junge
    Yu, Chao
    He, Zhaofeng
    Cheng, Nengjie
    Huang, Kaiqi
    INFORMATION SCIENCES, 2022, 610 : 746 - 758
  • [27] Dual Generator Offline Reinforcement Learning
    Vuong, Quan
    Kumar, Aviral
    Levine, Sergey
    Chebotar, Yevgen
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [28] An Optimistic Perspective on Offline Reinforcement Learning
    Agarwal, Rishabh
    Schuurmans, Dale
    Norouzi, Mohammad
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [29] A Minimalist Approach to Offline Reinforcement Learning
    Fujimoto, Scott
    Gu, Shixiang Shane
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [30] Offline Reinforcement Learning for Visual Navigation
    Shah, Dhruv
    Bhorkar, Arjun
    Leen, Hrish
    Kostrikov, Ilya
    Rhinehart, Nick
    Levine, Sergey
    CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 44 - 54