Optimistic Planning for Belief-Augmented Markov Decision Processes

被引:0
|
作者
Fonteneau, Raphael [1 ,2 ]
Busoniu, Lucian [3 ,4 ]
Munos, Remi [2 ]
机构
[1] Univ Liege, Dept Elect Engn & Comp Sci, B-4000 Liege, Belgium
[2] Inria Lille Nord Europe, Team SequeL, Lille, France
[3] Univ Lorraine, CRAN, UMR 7039, Nancy, France
[4] CNRS, CRAN, UMR 7039, Nancy, France
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper presents the Bayesian Optimistic Planning (BOP) algorithm, a novel model-based Bayesian reinforcement learning approach. BOP extends the planning approach of the Optimistic Planning for Markov Decision Processes (OP-MDP) algorithm [10], [9] to contexts where the transition model of the MDP is initially unknown and progressively learned through interactions within the environment. The knowledge about the unknown MDP is represented with a probability distribution over all possible transition models using Dirichlet distributions, and the BOP algorithm plans in the belief-augmented state space constructed by concatenating the original state vector with the current posterior distribution over transition models. We show that BOP becomes Bayesian optimal when the budget parameter increases to infinity. Preliminary empirical validations show promising performance.
引用
收藏
页码:77 / 84
页数:8
相关论文
共 50 条
  • [31] Inspection and maintenance planning: an application of semi-Markov decision processes
    Universite de Technologie de Troyes, Troyes, France
    J Intell Manuf, 5 (467-476):
  • [33] Minimax Regret Optimisation for Robust Planning in Uncertain Markov Decision Processes
    Rigter, Marc
    Lacerda, Bruno
    Hawes, Nick
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 11930 - 11938
  • [34] Strategic Planning under Uncertainties via Constrained Markov Decision Processes
    Ding, Xu Chu
    Pinto, Alessandro
    Surana, Amit
    2013 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2013, : 4568 - 4575
  • [35] Inspection and maintenance planning: an application of semi-Markov decision processes
    Berenguer, C
    Chu, CB
    Grall, A
    JOURNAL OF INTELLIGENT MANUFACTURING, 1997, 8 (05) : 467 - 476
  • [36] Inspection and maintenance planning: an application of semi-Markov decision processes
    CHRISTOPHE BERENGUER
    CHENGBIN CHU
    ANTOINE GRALL
    Journal of Intelligent Manufacturing, 1997, 8 : 467 - 476
  • [37] Policy Reuse for Learning and Planning in Partially Observable Markov Decision Processes
    Wu, Bo
    Feng, Yanpeng
    2017 4TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE), 2017, : 549 - 552
  • [38] Global path planning for AUV based on hierarchical Markov decision processes
    Hong, Ye
    Wang, Hong-Jian
    Bian, Xin-Qian
    Xitong Fangzhen Xuebao / Journal of System Simulation, 2008, 20 (09): : 2361 - 2363
  • [39] Markov Decision Processes For Multi-Objective Satellite Task Planning
    Eddy, Duncan
    Kochenderfer, Mykel
    2020 IEEE AEROSPACE CONFERENCE (AEROCONF 2020), 2020,
  • [40] Prioritized goal decomposition of Markov decision processes: Toward a synthesis of classical and decision theoretic planning
    Boutilier, C
    Brafman, RI
    Geib, C
    IJCAI-97 - PROCEEDINGS OF THE FIFTEENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS 1 AND 2, 1997, : 1156 - 1162