When to Trust Your Model: Model-Based Policy Optimization

被引:0
|
作者
Janner, Michael [1 ]
Fu, Justin [1 ]
Zhang, Marvin [1 ]
Levine, Sergey [1 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019) | 2019年 / 32卷
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Designing effective model-based reinforcement learning algorithms is difficult because the ease of data generation must be weighed against the bias of model-generated data. In this paper, we study the role of model usage in policy optimization both theoretically and empirically. We first formulate and analyze a model-based reinforcement learning algorithm with a guarantee of monotonic improvement at each step. In practice, this analysis is overly pessimistic and suggests that real off-policy data is always preferable to model-generated on-policy data, but we show that an empirical estimate of model generalization can be incorporated into such analysis to justify model usage. Motivated by this analysis, we then demonstrate that a simple procedure of using short model-generated rollouts branched from real data has the benefits of more complicated model-based algorithms without the usual pitfalls. In particular, this approach surpasses the sample efficiency of prior model-based methods, matches the asymptotic performance of the best model-free algorithms, and scales to horizons that cause other model-based methods to fail entirely.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Policy Optimization with Model-Based Explorations
    Pan, Feiyang
    Cai, Qingpeng
    Zeng, An-Xiang
    Pan, Chun-Xiang
    Da, Qing
    He, Hualin
    He, Qing
    Tang, Pingzhong
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 4675 - 4682
  • [2] Bidirectional Model-based Policy Optimization
    Lai, Hang
    Shen, Jian
    Zhang, Weinan
    Yu, Yong
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [3] Variational Model-based Policy Optimization
    Chow, Yinlam
    Cui, Brandon
    Ryu, MoonKyung
    Ghavamzadeh, Mohammad
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2292 - 2299
  • [4] Model-based Policy Optimization with Unsupervised Model Adaptation
    Shen, Jian
    Zhao, Han
    Zhang, Weinan
    Yu, Yong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [5] When to Update Your Model: Constrained Model-based Reinforcement Learning
    Ji, Tianying
    Luo, Yu
    Sun, Fuchun
    Jing, Mingxuan
    He, Fengxiang
    Huang, Wenbing
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [6] Trust the Model When It Is Confident: Masked Model-based Actor-Critic
    Pan, Feiyang
    He, Jia
    Tu, Dandan
    He, Qing
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [7] Proximal policy optimization with model-based methods
    Li, Shuailong
    Zhang, Wei
    Zhang, Huiwen
    Zhang, Xin
    Leng, Yuquan
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (06) : 5399 - 5410
  • [8] MOPO: Model-based Offline Policy Optimization
    Yu, Tianhe
    Thomas, Garrett
    Yu, Lantao
    Ermon, Stefano
    Zou, James
    Levine, Sergey
    Finn, Chelsea
    Ma, Tengyu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [9] Adaptation Augmented Model-based Policy Optimization
    Shen, Jian
    Lai, Hang
    Liu, Minghuan
    Zhao, Han
    Yu, Yong
    Zhang, Weinan
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [10] Moor: Model-based offline policy optimization with a risk dynamics model
    Su, Xiaolong
    Li, Peng
    Chen, Shaofei
    COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (01)