Stochastic optimization of controlled partially observable Markov decision processes

被引：0

作者：

Bartlett, PL ^{[1
]}

Baxter, J ^{[1
]}

机构：

[1] Australian Natl Univ, Res Sch Info Sci & Eng, Canberra, ACT 0200, Australia

来源：

PROCEEDINGS OF THE 39TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-5 | 2000年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We introduce an on-line algorithm for finding local maxima of the average reward in a Partially Observable Markov Decision Process (POMDP) controlled by a parameterized policy. Optimization is over the parameters of the policy. The algorithm's chief advantages are that it requires only a single sample path of the POMDP, it uses only one free parameter beta is an element of [0, 1), which has a natural interpretation in terms of a bias-variance trade-off, and it requires no knowledge of the underlying state. In addition, the algorithm can be applied to infinite state, control and observation spaces. We prove almost-sure convergence of our algorithm, and show how the correct setting of 0 is related to the mixing time of the Markov chain induced by the POMDP.

引用

页码：124 / 129

页数：6

共 50 条

[21] THE PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES FRAMEWORK IN MEDICAL DECISION MAKING
Goulionis, John E.
Stengos, Dimitrios I.
ADVANCES AND APPLICATIONS IN STATISTICS, 2008, 9 (02) : 205 - 232
[22] PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES AND PERIODIC POLICIES WITH APPLICATIONS
Goulionis, John
Stengos, D.
INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING, 2011, 10 (06) : 1175 - 1197
[23] An Argument for the Bayesian Control of Partially Observable Markov Decision Processes
Vargo, Erik
Cogill, Randy
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2014, 59 (10) : 2796 - 2800
[24] Partially observable Markov decision processes for spoken dialog systems
Williams, Jason D.
Young, Steve
COMPUTER SPEECH AND LANGUAGE, 2007, 21 (02): : 393 - 422
[25] Learning deterministic policies in partially observable Markov decision processes
Miyazaki, K
Kobayashi, S
INTELLIGENT AUTONOMOUS SYSTEMS: IAS-5, 1998, : 250 - 257
[26] Nonmyopic multiaspect sensing with partially observable Markov decision processes
Ji, Shihao
Parr, Ronald
Carin, Lawrence
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2007, 55 (06) : 2720 - 2730
[27] A Fast Approximation Method for Partially Observable Markov Decision Processes
Bingbing Liu
Yu Kang
Xiaofeng Jiang
Jiahu Qin
Journal of Systems Science and Complexity, 2018, 31 : 1423 - 1436
[28] Partially Observable Markov Decision Processes incorporating epistemic uncertainties
Faddoul, R.
Raphael, W.
Soubra, A. -H.
Chateauneuf, A.
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2015, 241 (02) : 391 - 401
[29] STRUCTURAL RESULTS FOR PARTIALLY OBSERVABLE MARKOV DECISION-PROCESSES
ALBRIGHT, SC
OPERATIONS RESEARCH, 1979, 27 (05) : 1041 - 1053
[30] MEDICAL TREATMENTS USING PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES
Goulionis, John E.
JP JOURNAL OF BIOSTATISTICS, 2009, 3 (02) : 77 - 97

← 1 2 3 4 5 →