Hyperparameter Tuning in Offline Reinforcement Learning

被引：0

作者：

Tittaferrante, Andrew ^{[1
]}

Yassine, Abdulsalam ^{[2
]}

机构：

[1] Lakehead Univ, Elect & Comp Engn, Thunder Bay, ON, Canada

[2] Lakehead Univ, Software Engn, Thunder Bay, ON, Canada

来源：

2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA | 2022年

关键词：

Deep Learning; Reinforcement Learning; Offline Reinforcement Learning;

D O I：

10.1109/ICMLA55696.2022.00101

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this work, we propose a reliable hyper-parameter tuning scheme for offline reinforcement learning. We demonstrate our proposed scheme using the simplest antmaze environment from the standard benchmark offline dataset, D4RL. The usual approach for policy evaluation in offline reinforcement learning involves online evaluation, i.e., cherry-picking best performance on the test environment. To mitigate this cherry-picking, we propose an ad-hoc online evaluation metric, which we name "median-median-return". This metric enables more reliable reporting of results because it represents the expected performance of the learned policy by taking the median online evaluation performance across both epochs and training runs. To demonstrate our scheme, we employ the recently state-of-the-art algorithm, IQL, and perform a thorough hyperparameter search based on our proposed metric. The tuned architectures enjoy notably stronger cherry-picked performance, and the best models are able to surpass the reported state-of-the-art performance on average.

引用

页码：585 / 590

页数：6

共 50 条

[1] Offline Reinforcement Learning with Behavioral Supervisor Tuning
Srinivasan, Padmanaba
Knottenbelt, William
PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 4929 - 4937
[2] Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning
Zhang, Siyuan
Jiang, Nan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
[3] Automated Hyperparameter Tuning in Reinforcement Learning for Quadrupedal Robot Locomotion
Kim, Myeongseop
Kim, Jung-Su
Park, Jae-Han
ELECTRONICS, 2024, 13 (01)
[4] Online weighted Q-ensembles for reduced hyperparameter tuning in reinforcement learning
Garcia R.
Caarls W.
Soft Computing, 2024, 28 (13-14) : 8549 - 8559
[5] Online Tuning for Offline Decentralized Multi-Agent Reinforcement Learning
Jiang, Jiechuan
Lu, Zongqing
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 7, 2023, : 8050 - +
[6] Fine-Tuning a Personalized OpenBioLLM Using Offline Reinforcement Learning
Shi, Jinsheng
Yuan, Yuyu
Wang, Ao
Nie, Meng
APPLIED SCIENCES-BASEL, 2025, 15 (05):
[7] Meta-reinforcement learning for the tuning of PI controllers: An offline approach
McClement, Daniel G.
Lawrence, Nathan P.
Backstroem, Johan U.
Loewen, Philip D.
Forbes, Michael G.
Gopaluni, R. Bhushan
JOURNAL OF PROCESS CONTROL, 2022, 118 : 139 - 152
[8] Hyperparameter Tuning of an Off-Policy Reinforcement Learning Algorithm for H∞ Tracking Control
Farahmandi, Alireza
Reitz, Brian
Debord, Mark
Philbrick, Douglas
Estabridis, Katia
Hewer, Gary
LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
[9] Automatic Hyperparameter Tuning in Deep Convolutional Neural Networks Using Asynchronous Reinforcement Learning
Neary, Patrick L.
2018 IEEE INTERNATIONAL CONFERENCE ON COGNITIVE COMPUTING (ICCC), 2018, : 73 - 77
[10] Batch Reinforcement Learning with Hyperparameter Gradients
Lee, Byung-Jun
Lee, Jongmin
Vrancx, Peter
Kim, Dongho
Kim, Kee-Eung
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119

← 1 2 3 4 5 →