Hyperparameter Tuning in Offline Reinforcement Learning

被引：0

作者：

Tittaferrante, Andrew ^{[1
]}

Yassine, Abdulsalam ^{[2
]}

机构：

[1] Lakehead Univ, Elect & Comp Engn, Thunder Bay, ON, Canada

[2] Lakehead Univ, Software Engn, Thunder Bay, ON, Canada

来源：

2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA | 2022年

关键词：

Deep Learning; Reinforcement Learning; Offline Reinforcement Learning;

D O I：

10.1109/ICMLA55696.2022.00101

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this work, we propose a reliable hyper-parameter tuning scheme for offline reinforcement learning. We demonstrate our proposed scheme using the simplest antmaze environment from the standard benchmark offline dataset, D4RL. The usual approach for policy evaluation in offline reinforcement learning involves online evaluation, i.e., cherry-picking best performance on the test environment. To mitigate this cherry-picking, we propose an ad-hoc online evaluation metric, which we name "median-median-return". This metric enables more reliable reporting of results because it represents the expected performance of the learned policy by taking the median online evaluation performance across both epochs and training runs. To demonstrate our scheme, we employ the recently state-of-the-art algorithm, IQL, and perform a thorough hyperparameter search based on our proposed metric. The tuned architectures enjoy notably stronger cherry-picked performance, and the best models are able to surpass the reported state-of-the-art performance on average.

引用

页码：585 / 590

页数：6

共 50 条

[21] Offline Reinforcement Learning with Differential Privacy
Qiao, Dan
Wang, Yu-Xiang
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[22] Bootstrapped Transformer for Offline Reinforcement Learning
Wang, Kerong
Zhao, Hanye
Luo, Xufang
Ren, Kan
Zhang, Weinan
Li, Dongsheng
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[23] Offline reinforcement learning with task hierarchies
Schwab, Devin
Ray, Soumya
MACHINE LEARNING, 2017, 106 (9-10) : 1569 - 1598
[24] Conservative network for offline reinforcement learning
Peng, Zhiyong
Liu, Yadong
Chen, Haoqiang
Zhou, Zongtan
KNOWLEDGE-BASED SYSTEMS, 2023, 282
[25] Conservative Offline Distributional Reinforcement Learning
Ma, Yecheng Jason
Jayaraman, Dinesh
Bastani, Osbert
Advances in Neural Information Processing Systems, 2021, 23 : 19235 - 19247
[26] Offline reinforcement learning with representations for actions
Lou, Xingzhou
Yin, Qiyue
Zhang, Junge
Yu, Chao
He, Zhaofeng
Cheng, Nengjie
Huang, Kaiqi
INFORMATION SCIENCES, 2022, 610 : 746 - 758
[27] Dual Generator Offline Reinforcement Learning
Vuong, Quan
Kumar, Aviral
Levine, Sergey
Chebotar, Yevgen
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[28] An Optimistic Perspective on Offline Reinforcement Learning
Agarwal, Rishabh
Schuurmans, Dale
Norouzi, Mohammad
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[29] A Minimalist Approach to Offline Reinforcement Learning
Fujimoto, Scott
Gu, Shixiang Shane
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[30] Offline Reinforcement Learning for Visual Navigation
Shah, Dhruv
Bhorkar, Arjun
Leen, Hrish
Kostrikov, Ilya
Rhinehart, Nick
Levine, Sergey
CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 44 - 54

← 1 2 3 4 5 →