On Effective Scheduling of Model-based Reinforcement Learning

被引：0

作者：

Lai, Hang ^{[1
]}

Shen, Jian ^{[1
]}

Zhang, Weinan ^{[1
]}

Huang, Yimin ^{[2
]}

Zhang, Xing ^{[2
]}

Tang, Ruiming ^{[2
]}

Yu, Yong ^{[1
]}

Li, Zhenguo ^{[2
]}

机构：

[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China

[2] Huawei Noahs Ark Lab, Montreal, PQ, Canada

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Model-based reinforcement learning has attracted wide attention due to its superior sample efficiency. Despite its impressive success so far, it is still unclear how to appropriately schedule the important hyperparameters to achieve adequate performance, such as the real data ratio for policy optimization in Dyna-style model-based algorithms. In this paper, we first theoretically analyze the role of real data in policy training, which suggests that gradually increasing the ratio of real data yields better performance. Inspired by the analysis, we propose a framework named AutoMBPO to automatically schedule the real data ratio as well as other hyperparameters in training model-based policy optimization (MBPO) algorithm, a representative running case of model-based methods. On several continuous control tasks, the MBPO instance trained with hyperparameters scheduled by AutoMBPO can significantly surpass the original one, and the real data ratio schedule found by AutoMBPO shows consistency with our theoretical analysis.

引用

页数：12

共 50 条

[1] Model-Based Reinforcement Learning Method for Microgrid Optimization Scheduling
Yao, Jinke
Xu, Jiachen
Zhang, Ning
Guan, Yajuan
SUSTAINABILITY, 2023, 15 (12)
[2] Model-based Reinforcement Learning: A Survey
Moerland, Thomas M.
Broekens, Joost
Plaat, Aske
Jonker, Catholijn M.
FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2023, 16 (01): : 1 - 118
[3] A survey on model-based reinforcement learning
Fan-Ming LUO
Tian XU
Hang LAI
Xiong-Hui CHEN
Weinan ZHANG
Yang YU
Science China(Information Sciences), 2024, 67 (02) : 59 - 84
[4] Nonparametric model-based reinforcement learning
Atkeson, CG
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 10, 1998, 10 : 1008 - 1014
[5] The ubiquity of model-based reinforcement learning
Doll, Bradley B.
Simon, Dylan A.
Daw, Nathaniel D.
CURRENT OPINION IN NEUROBIOLOGY, 2012, 22 (06) : 1075 - 1081
[6] Multiple model-based reinforcement learning
Doya, K
Samejima, K
Katagiri, K
Kawato, M
NEURAL COMPUTATION, 2002, 14 (06) : 1347 - 1369
[7] A survey on model-based reinforcement learning
Luo, Fan-Ming
Xu, Tian
Lai, Hang
Chen, Xiong-Hui
Zhang, Weinan
Yu, Yang
SCIENCE CHINA-INFORMATION SCIENCES, 2024, 67 (02)
[8] Learning to Paint With Model-based Deep Reinforcement Learning
Huang, Zhewei
Heng, Wen
Zhou, Shuchang
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8708 - 8717
[9] Incremental model-based reinforcement learning with model constraint
Yang, Zhiyou
Fu, Mingsheng
Qu, Hong
Li, Fan
Shi, Shuqing
Hu, Wang
NEURAL NETWORKS, 2025, 185
[10] Objective Mismatch in Model-based Reinforcement Learning
Lambert, Nathan
Amos, Brandon
Yadan, Omry
Calandra, Roberto
LEARNING FOR DYNAMICS AND CONTROL, VOL 120, 2020, 120 : 761 - 770

← 1 2 3 4 5 →