Hyperparameter Tuning in Offline Reinforcement Learning

被引：0

作者：

Tittaferrante, Andrew ^{[1
]}

Yassine, Abdulsalam ^{[2
]}

机构：

[1] Lakehead Univ, Elect & Comp Engn, Thunder Bay, ON, Canada

[2] Lakehead Univ, Software Engn, Thunder Bay, ON, Canada

来源：

2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA | 2022年

关键词：

Deep Learning; Reinforcement Learning; Offline Reinforcement Learning;

D O I：

10.1109/ICMLA55696.2022.00101

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this work, we propose a reliable hyper-parameter tuning scheme for offline reinforcement learning. We demonstrate our proposed scheme using the simplest antmaze environment from the standard benchmark offline dataset, D4RL. The usual approach for policy evaluation in offline reinforcement learning involves online evaluation, i.e., cherry-picking best performance on the test environment. To mitigate this cherry-picking, we propose an ad-hoc online evaluation metric, which we name "median-median-return". This metric enables more reliable reporting of results because it represents the expected performance of the learned policy by taking the median online evaluation performance across both epochs and training runs. To demonstrate our scheme, we employ the recently state-of-the-art algorithm, IQL, and perform a thorough hyperparameter search based on our proposed metric. The tuned architectures enjoy notably stronger cherry-picked performance, and the best models are able to surpass the reported state-of-the-art performance on average.

引用

页码：585 / 590

页数：6

共 50 条

[41] Efficient Online Hyperparameter Adaptation for Deep Reinforcement Learning
Zhou, Yinda
Liu, Weiming
Li, Bin
APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2019, 2019, 11454 : 141 - 155
[42] Learning to Influence Human Behavior with Offline Reinforcement Learning
Hong, Joey
Levine, Sergey
Dragan, Anca
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[43] A Review of Offline Reinforcement Learning Based on Representation Learning
Wang X.-S.
Wang R.-R.
Cheng Y.-H.
Zidonghua Xuebao/Acta Automatica Sinica, 2024, 50 (06): : 1104 - 1128
[44] Offline Policy Iteration Based Reinforcement Learning Controller for Online Robotic Knee Prosthesis Parameter Tuning
Li, Minhan
Gao, Xiang
Wen, Yue
Si, Jennie
Huang, He
2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 2831 - 2837
[45] Discrete Uncertainty Quantification For Offline Reinforcement Learning
Perez, Jose Luis
Corrochano, Javier
Garcia, Javier
Majadas, Ruben
Ibanez-Llano, Cristina
Perez, Sergio
Fernandez, Fernando
JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH, 2023, 13 (04) : 273 - 287
[46] Supported Value Regularization for Offline Reinforcement Learning
Mao, Yixiu
Zhang, Hongchang
Chen, Chen
Xu, Yi
Ji, Xiangyang
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[47] Deadly triad matters for offline reinforcement learning
Peng, Zhiyong
Liu, Yadong
Zhou, Zongtan
KNOWLEDGE-BASED SYSTEMS, 2024, 284
[48] Robust Reinforcement Learning using Offline Data
Panaganti, Kishan
Xu, Zaiyan
Kalathil, Dileep
Ghavamzadeh, Mohammad
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[49] Fast Rates for the Regret of Offline Reinforcement Learning
Hu, Yichun
Kallus, Nathan
Uehara, Masatoshi
MATHEMATICS OF OPERATIONS RESEARCH, 2025, 50 (01)
[50] Boundary Data Augmentation for Offline Reinforcement Learning
SHEN Jiahao
JIANG Ke
TAN Xiaoyang
ZTE Communications, 2023, 21 (03) : 29 - 36

← 1 2 3 4 5 →