Stochastic optimization of controlled partially observable Markov decision processes

被引：0

作者：

Bartlett, PL ^{[1
]}

Baxter, J ^{[1
]}

机构：

[1] Australian Natl Univ, Res Sch Info Sci & Eng, Canberra, ACT 0200, Australia

来源：

PROCEEDINGS OF THE 39TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-5 | 2000年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We introduce an on-line algorithm for finding local maxima of the average reward in a Partially Observable Markov Decision Process (POMDP) controlled by a parameterized policy. Optimization is over the parameters of the policy. The algorithm's chief advantages are that it requires only a single sample path of the POMDP, it uses only one free parameter beta is an element of [0, 1), which has a natural interpretation in terms of a bias-variance trade-off, and it requires no knowledge of the underlying state. In addition, the algorithm can be applied to infinite state, control and observation spaces. We prove almost-sure convergence of our algorithm, and show how the correct setting of 0 is related to the mixing time of the Markov chain induced by the POMDP.

引用

页码：124 / 129

页数：6

共 50 条

[31] Qualitative Analysis of Partially-Observable Markov Decision Processes
Chatterjee, Krishnendu
Doyen, Laurent
Henzinger, Thomas A.
MATHEMATICAL FOUNDATIONS OF COMPUTER SCIENCE 2010, 2010, 6281 : 258 - 269
[32] Equivalence Relations in Fully and Partially Observable Markov Decision Processes
Castro, Pablo Samuel
Panangaden, Prakash
Precup, Doina
21ST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-09), PROCEEDINGS, 2009, : 1653 - 1658
[33] Recursively-Constrained Partially Observable Markov Decision Processes
Ho, Qi Heng
Becker, Tyler
Kraske, Benjamin
Laouar, Zakariya
Feather, Martin S.
Rossi, Federico
Lahijanian, Morteza
Sunberg, Zachary
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2024, 244 : 1658 - 1680
[34] A Fast Approximation Method for Partially Observable Markov Decision Processes
LIU Bingbing
KANG Yu
JIANG Xiaofeng
QIN Jiahu
JournalofSystemsScience&Complexity, 2018, 31 (06) : 1423 - 1436
[35] Active Chemical Sensing With Partially Observable Markov Decision Processes
Gosangi, Rakesh
Gutierrez-Osuna, Ricardo
OLFACTION AND ELECTRONIC NOSE, PROCEEDINGS, 2009, 1137 : 562 - 565
[36] Reinforcement learning algorithm for partially observable Markov decision processes
Wang, Xue-Ning
He, Han-Gen
Xu, Xin
Kongzhi yu Juece/Control and Decision, 2004, 19 (11): : 1263 - 1266
[37] Partially Observable Markov Decision Processes and Performance Sensitivity Analysis
Li, Yanjie
Yin, Baoqun
Xi, Hongsheng
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (06): : 1645 - 1651
[38] Learning factored representations for partially observable Markov decision processes
Sallans, B
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 12, 2000, 12 : 1050 - 1056
[39] Partially Observable Markov Decision Processes: A Geometric Technique and Analysis
Zhang, Hao
OPERATIONS RESEARCH, 2010, 58 (01) : 214 - 228
[40] Partially Observable Risk-Sensitive Markov Decision Processes
Baeuerle, Nicole
Rieder, Ulrich
MATHEMATICS OF OPERATIONS RESEARCH, 2017, 42 (04) : 1180 - 1196

← 1 2 3 4 5 →