Model-based Reinforcement Learning and the Eluder Dimension

被引:0
作者
Osband, Ian [1 ]
Van Roy, Benjamin [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014) | 2014年 / 27卷
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the problem of learning to optimize an unknown Markov decision process (MDP). We show that, if the MDP can be parameterized within some known function class, we can obtain regret bounds that scale with the dimensionality, rather than cardinality, of the system. We characterize this dependence explicitly as (O) over tilde(root d(K)d(E)T) where T is time elapsed, d(K) is the Kolmogorov dimension and d(E) is the eluder dimension. These represent the first unified regret bounds for model-based reinforcement learning and provide state of the art guarantees in several important settings. More-over, we present a simple and computationally efficient algorithm posterior sampling for reinforcement learning (PSRL) that satisfies these bounds.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Skill-based Model-based Reinforcement Learning
    Shi, Lucy Xiaoyang
    Lim, Joseph J.
    Lee, Youngwoon
    CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 2262 - 2272
  • [32] Model gradient: unified model and policy learning in model-based reinforcement learning
    Chengxing Jia
    Fuxiang Zhang
    Tian Xu
    Jing-Cheng Pang
    Zongzhang Zhang
    Yang Yu
    Frontiers of Computer Science, 2024, 18
  • [33] Model gradient: unified model and policy learning in model-based reinforcement learning
    Jia, Chengxing
    Zhang, Fuxiang
    Xu, Tian
    Pang, Jing-Cheng
    Zhang, Zongzhang
    Yu, Yang
    FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (04)
  • [34] Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension
    Wang, Ruosong
    Salakhutdinov, Ruslan
    Yang, Lin F.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [35] Incremental Learning of Planning Actions in Model-Based Reinforcement Learning
    Ng, Jun Hao Alvin
    Petrick, Ronald P. A.
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3195 - 3201
  • [36] Learning to Reweight Imaginary Transitions for Model-Based Reinforcement Learning
    Huang, Wenzhen
    Yin, Qiyue
    Zhang, Junge
    Huang, Kaiqi
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7848 - 7856
  • [37] Understanding the Eluder Dimension
    Li, Gene
    Kamath, Pritish
    Foster, Dylan J.
    Srebro, Nathan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [38] Model-Based Transfer Reinforcement Learning Based on Graphical Model Representations
    Sun, Yuewen
    Zhang, Kun
    Sun, Changyin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (02) : 1035 - 1048
  • [39] Weighted model estimation for offline model-based reinforcement learning
    Hishinuma, Toru
    Senda, Kei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [40] Latent Causal Dynamics Model for Model-Based Reinforcement Learning
    Hao, Zhifeng
    Zhu, Haipeng
    Chen, Wei
    Cai, Ruichu
    NEURAL INFORMATION PROCESSING, ICONIP 2023, PT II, 2024, 14448 : 219 - 230