Hyperparameter Tuning in Offline Reinforcement Learning

被引:0
|
作者
Tittaferrante, Andrew [1 ]
Yassine, Abdulsalam [2 ]
机构
[1] Lakehead Univ, Elect & Comp Engn, Thunder Bay, ON, Canada
[2] Lakehead Univ, Software Engn, Thunder Bay, ON, Canada
关键词
Deep Learning; Reinforcement Learning; Offline Reinforcement Learning;
D O I
10.1109/ICMLA55696.2022.00101
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we propose a reliable hyper-parameter tuning scheme for offline reinforcement learning. We demonstrate our proposed scheme using the simplest antmaze environment from the standard benchmark offline dataset, D4RL. The usual approach for policy evaluation in offline reinforcement learning involves online evaluation, i.e., cherry-picking best performance on the test environment. To mitigate this cherry-picking, we propose an ad-hoc online evaluation metric, which we name "median-median-return". This metric enables more reliable reporting of results because it represents the expected performance of the learned policy by taking the median online evaluation performance across both epochs and training runs. To demonstrate our scheme, we employ the recently state-of-the-art algorithm, IQL, and perform a thorough hyperparameter search based on our proposed metric. The tuned architectures enjoy notably stronger cherry-picked performance, and the best models are able to surpass the reported state-of-the-art performance on average.
引用
收藏
页码:585 / 590
页数:6
相关论文
共 50 条
  • [1] Offline Reinforcement Learning with Behavioral Supervisor Tuning
    Srinivasan, Padmanaba
    Knottenbelt, William
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 4929 - 4937
  • [2] Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning
    Zhang, Siyuan
    Jiang, Nan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [3] Automated Hyperparameter Tuning in Reinforcement Learning for Quadrupedal Robot Locomotion
    Kim, Myeongseop
    Kim, Jung-Su
    Park, Jae-Han
    ELECTRONICS, 2024, 13 (01)
  • [4] Online weighted Q-ensembles for reduced hyperparameter tuning in reinforcement learning
    Garcia R.
    Caarls W.
    Soft Computing, 2024, 28 (13-14) : 8549 - 8559
  • [5] Online Tuning for Offline Decentralized Multi-Agent Reinforcement Learning
    Jiang, Jiechuan
    Lu, Zongqing
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 7, 2023, : 8050 - +
  • [6] Fine-Tuning a Personalized OpenBioLLM Using Offline Reinforcement Learning
    Shi, Jinsheng
    Yuan, Yuyu
    Wang, Ao
    Nie, Meng
    APPLIED SCIENCES-BASEL, 2025, 15 (05):
  • [7] Meta-reinforcement learning for the tuning of PI controllers: An offline approach
    McClement, Daniel G.
    Lawrence, Nathan P.
    Backstroem, Johan U.
    Loewen, Philip D.
    Forbes, Michael G.
    Gopaluni, R. Bhushan
    JOURNAL OF PROCESS CONTROL, 2022, 118 : 139 - 152
  • [8] Hyperparameter Tuning of an Off-Policy Reinforcement Learning Algorithm for H∞ Tracking Control
    Farahmandi, Alireza
    Reitz, Brian
    Debord, Mark
    Philbrick, Douglas
    Estabridis, Katia
    Hewer, Gary
    LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
  • [9] Automatic Hyperparameter Tuning in Deep Convolutional Neural Networks Using Asynchronous Reinforcement Learning
    Neary, Patrick L.
    2018 IEEE INTERNATIONAL CONFERENCE ON COGNITIVE COMPUTING (ICCC), 2018, : 73 - 77
  • [10] Batch Reinforcement Learning with Hyperparameter Gradients
    Lee, Byung-Jun
    Lee, Jongmin
    Vrancx, Peter
    Kim, Dongho
    Kim, Kee-Eung
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119