Stochastic optimization of controlled partially observable Markov decision processes

被引:0
|
作者
Bartlett, PL [1 ]
Baxter, J [1 ]
机构
[1] Australian Natl Univ, Res Sch Info Sci & Eng, Canberra, ACT 0200, Australia
来源
PROCEEDINGS OF THE 39TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-5 | 2000年
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We introduce an on-line algorithm for finding local maxima of the average reward in a Partially Observable Markov Decision Process (POMDP) controlled by a parameterized policy. Optimization is over the parameters of the policy. The algorithm's chief advantages are that it requires only a single sample path of the POMDP, it uses only one free parameter beta is an element of [0, 1), which has a natural interpretation in terms of a bias-variance trade-off, and it requires no knowledge of the underlying state. In addition, the algorithm can be applied to infinite state, control and observation spaces. We prove almost-sure convergence of our algorithm, and show how the correct setting of 0 is related to the mixing time of the Markov chain induced by the POMDP.
引用
收藏
页码:124 / 129
页数:6
相关论文
共 50 条
  • [31] Qualitative Analysis of Partially-Observable Markov Decision Processes
    Chatterjee, Krishnendu
    Doyen, Laurent
    Henzinger, Thomas A.
    MATHEMATICAL FOUNDATIONS OF COMPUTER SCIENCE 2010, 2010, 6281 : 258 - 269
  • [32] Equivalence Relations in Fully and Partially Observable Markov Decision Processes
    Castro, Pablo Samuel
    Panangaden, Prakash
    Precup, Doina
    21ST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-09), PROCEEDINGS, 2009, : 1653 - 1658
  • [33] Recursively-Constrained Partially Observable Markov Decision Processes
    Ho, Qi Heng
    Becker, Tyler
    Kraske, Benjamin
    Laouar, Zakariya
    Feather, Martin S.
    Rossi, Federico
    Lahijanian, Morteza
    Sunberg, Zachary
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2024, 244 : 1658 - 1680
  • [34] A Fast Approximation Method for Partially Observable Markov Decision Processes
    LIU Bingbing
    KANG Yu
    JIANG Xiaofeng
    QIN Jiahu
    JournalofSystemsScience&Complexity, 2018, 31 (06) : 1423 - 1436
  • [35] Active Chemical Sensing With Partially Observable Markov Decision Processes
    Gosangi, Rakesh
    Gutierrez-Osuna, Ricardo
    OLFACTION AND ELECTRONIC NOSE, PROCEEDINGS, 2009, 1137 : 562 - 565
  • [36] Reinforcement learning algorithm for partially observable Markov decision processes
    Wang, Xue-Ning
    He, Han-Gen
    Xu, Xin
    Kongzhi yu Juece/Control and Decision, 2004, 19 (11): : 1263 - 1266
  • [37] Partially Observable Markov Decision Processes and Performance Sensitivity Analysis
    Li, Yanjie
    Yin, Baoqun
    Xi, Hongsheng
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (06): : 1645 - 1651
  • [38] Learning factored representations for partially observable Markov decision processes
    Sallans, B
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 12, 2000, 12 : 1050 - 1056
  • [39] Partially Observable Markov Decision Processes: A Geometric Technique and Analysis
    Zhang, Hao
    OPERATIONS RESEARCH, 2010, 58 (01) : 214 - 228
  • [40] Partially Observable Risk-Sensitive Markov Decision Processes
    Baeuerle, Nicole
    Rieder, Ulrich
    MATHEMATICS OF OPERATIONS RESEARCH, 2017, 42 (04) : 1180 - 1196