When to Trust Your Model: Model-Based Policy Optimization

被引：0

作者：

Janner, Michael ^{[1
]}

Fu, Justin ^{[1
]}

Zhang, Marvin ^{[1
]}

Levine, Sergey ^{[1
]}

机构：

[1] Univ Calif Berkeley, Berkeley, CA 94720 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019) | 2019年 / 32卷

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Designing effective model-based reinforcement learning algorithms is difficult because the ease of data generation must be weighed against the bias of model-generated data. In this paper, we study the role of model usage in policy optimization both theoretically and empirically. We first formulate and analyze a model-based reinforcement learning algorithm with a guarantee of monotonic improvement at each step. In practice, this analysis is overly pessimistic and suggests that real off-policy data is always preferable to model-generated on-policy data, but we show that an empirical estimate of model generalization can be incorporated into such analysis to justify model usage. Motivated by this analysis, we then demonstrate that a simple procedure of using short model-generated rollouts branched from real data has the benefits of more complicated model-based algorithms without the usual pitfalls. In particular, this approach surpasses the sample efficiency of prior model-based methods, matches the asymptotic performance of the best model-free algorithms, and scales to horizons that cause other model-based methods to fail entirely.

引用

页数：12

共 50 条

[1] Policy Optimization with Model-Based Explorations
Pan, Feiyang
Cai, Qingpeng
Zeng, An-Xiang
Pan, Chun-Xiang
Da, Qing
He, Hualin
He, Qing
Tang, Pingzhong
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 4675 - 4682
[2] Bidirectional Model-based Policy Optimization
Lai, Hang
Shen, Jian
Zhang, Weinan
Yu, Yong
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[3] Variational Model-based Policy Optimization
Chow, Yinlam
Cui, Brandon
Ryu, MoonKyung
Ghavamzadeh, Mohammad
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2292 - 2299
[4] Model-based Policy Optimization with Unsupervised Model Adaptation
Shen, Jian
Zhao, Han
Zhang, Weinan
Yu, Yong
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[5] When to Update Your Model: Constrained Model-based Reinforcement Learning
Ji, Tianying
Luo, Yu
Sun, Fuchun
Jing, Mingxuan
He, Fengxiang
Huang, Wenbing
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[6] Trust the Model When It Is Confident: Masked Model-based Actor-Critic
Pan, Feiyang
He, Jia
Tu, Dandan
He, Qing
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[7] Proximal policy optimization with model-based methods
Li, Shuailong
Zhang, Wei
Zhang, Huiwen
Zhang, Xin
Leng, Yuquan
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (06) : 5399 - 5410
[8] MOPO: Model-based Offline Policy Optimization
Yu, Tianhe
Thomas, Garrett
Yu, Lantao
Ermon, Stefano
Zou, James
Levine, Sergey
Finn, Chelsea
Ma, Tengyu
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[9] Adaptation Augmented Model-based Policy Optimization
Shen, Jian
Lai, Hang
Liu, Minghuan
Zhao, Han
Yu, Yong
Zhang, Weinan
JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
[10] Moor: Model-based offline policy optimization with a risk dynamics model
Su, Xiaolong
Li, Peng
Chen, Shaofei
COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (01)

← 1 2 3 4 5 →