Optimistic Planning for Belief-Augmented Markov Decision Processes

被引:0
|
作者
Fonteneau, Raphael [1 ,2 ]
Busoniu, Lucian [3 ,4 ]
Munos, Remi [2 ]
机构
[1] Univ Liege, Dept Elect Engn & Comp Sci, B-4000 Liege, Belgium
[2] Inria Lille Nord Europe, Team SequeL, Lille, France
[3] Univ Lorraine, CRAN, UMR 7039, Nancy, France
[4] CNRS, CRAN, UMR 7039, Nancy, France
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper presents the Bayesian Optimistic Planning (BOP) algorithm, a novel model-based Bayesian reinforcement learning approach. BOP extends the planning approach of the Optimistic Planning for Markov Decision Processes (OP-MDP) algorithm [10], [9] to contexts where the transition model of the MDP is initially unknown and progressively learned through interactions within the environment. The knowledge about the unknown MDP is represented with a probability distribution over all possible transition models using Dirichlet distributions, and the BOP algorithm plans in the belief-augmented state space constructed by concatenating the original state vector with the current posterior distribution over transition models. We show that BOP becomes Bayesian optimal when the budget parameter increases to infinity. Preliminary empirical validations show promising performance.
引用
收藏
页码:77 / 84
页数:8
相关论文
共 50 条
  • [21] Using rewards for belief state updates in partially observable Markov decision processes
    Izadi, MT
    Precup, D
    MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 593 - 600
  • [22] Markov decision processes
    White, D.J.
    Journal of the Operational Research Society, 1995, 46 (06):
  • [23] Markov Decision Processes
    Bäuerle N.
    Rieder U.
    Jahresbericht der Deutschen Mathematiker-Vereinigung, 2010, 112 (4) : 217 - 243
  • [24] Driving force planning in shield tunneling based on Markov decision processes
    HU XiangTao HUANG YongAn YIN ZhouPing XIONG YouLun State Key Laboratory of Digital Manufacturing Equipment Technology Huazhong University of Science Technology Wuhan China
    Science China(Technological Sciences), 2012, 55 (04) : 1022 - 1030
  • [25] Unifying nondeterministic and probabilistic planning through imprecise Markov Decision Processes
    Trevizan, Felipe W.
    Cozman, Fabio G.
    de Barros, Leliane N.
    ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA-SBIA 2006, PROCEEDINGS, 2006, 4140 : 502 - 511
  • [26] Driving force planning in shield tunneling based on Markov decision processes
    Hu XiangTao
    Huang YongAn
    Yin ZhouPing
    Xiong YouLun
    SCIENCE CHINA-TECHNOLOGICAL SCIENCES, 2012, 55 (04) : 1022 - 1030
  • [27] Robust motion planning using Markov Decision Processes and quadtree decomposition
    Burlet, J
    Aycard, O
    Fraichard, T
    2004 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1- 5, PROCEEDINGS, 2004, : 2820 - 2825
  • [28] Driving force planning in shield tunneling based on Markov decision processes
    XiangTao Hu
    YongAn Huang
    ZhouPing Yin
    YouLun Xiong
    Science China Technological Sciences, 2012, 55 : 1022 - 1030
  • [29] Planning in Markov Decision Processes with Gap-Dependent Sample Complexity
    Jonsson, Anders
    Kaufmann, Emilie
    Menard, Pierre
    Domingues, Omar Darwiche
    Leurent, Edouard
    Valko, Michal
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [30] A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes
    Ross, Stephane
    Pineau, Joelle
    Chaib-draa, Brahim
    Kreitmann, Pierre
    JOURNAL OF MACHINE LEARNING RESEARCH, 2011, 12 : 1729 - 1770