The optimal probability of the risk for finite horizon partially observable Markov decision processes

被引：0

作者：

Wen, Xian ^{[1
]}

Huo, Haifeng ^{[1
]}

Cui, Jinhua ^{[1
]}

机构：

[1] Guangxi Univ Sci & Technol, Sch Sci, Liuzhou 541006, Peoples R China

来源：

AIMS MATHEMATICS | 2023年 / 8卷 / 12期

基金：

中国国家自然科学基金;

关键词：

partially observable Markov decision processes; risk probability criterion; Bayes operator; the optimal policy; MINIMIZATION; MODELS; VARIANCE;

D O I：

10.3934/math.20231455

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

This paper investigates the optimality of the risk probability for finite horizon partially observable discrete-time Markov decision processes (POMDPs). The probability of the risk is optimized based on the criterion of total rewards not exceeding the preset goal value, which is different from the optimal problem of expected rewards. Based on the Bayes operator and the filter equations, the optimization problem of risk probability can be equivalently reformulated as filtered Markov decision processes. As an advantage of developing the value iteration technique, the optimality equation satisfied by the value function is established and the existence of the risk probability optimal policy is proven. Finally, an example is given to illustrate the effectiveness of using the value iteration algorithm to compute the value function and optimal policy.

引用

页码：28435 / 28449

页数：15

共 25 条

[1] [Anonymous], 1970, FDN NONSTATIONARY DY
[2] Bäuerle N, 2011, UNIVERSITEXT, P1, DOI 10.1007/978-3-642-18324-9
[3] Bertsekas D. P., 1996, Stochastic optimal control: the discrete-time case, V5
[4] Drake A, 1962, Observation of a Markov process through a noisy channel
[5] Partially Observable Total-Cost Markov Decision Processes with Weakly Continuous Transition Probabilities
Feinberg, Eugene A.
Kasyanov, Pavlo O.
Zgurovsky, Michael Z.
[J]. MATHEMATICS OF OPERATIONS RESEARCH, 2016, 41 (02) : 656 - 681
[6] Guo X, 2009, STOCH MOD APPL PROBA, V62, P1, DOI 10.1007/978-3-642-02547-1
[7] Guided Soft Actor Critic: A Guided Deep Reinforcement Learning Approach for Partially Observable Markov Decision Processes
Haklidir, Mehmet
Temeltas, Hakan
[J]. IEEE ACCESS, 2021, 9 : 159672 - 159683
[8] Hernandez-Lerma O., 1989, ADAPTIVE MARKOV CONT, V79
[9] A minimization problem of the risk probability in first passage semi-Markov decision processes with loss rates
Huang XiangXiang
Zou XiaoLong
Guo XianPing
[J]. SCIENCE CHINA-MATHEMATICS, 2015, 58 (09) : 1923 - 1938
[10] Minimum risk probability for finite horizon semi-Markov decision processes
Huang, Yonghui
Guo, Xianping
Li, Zhongfei
[J]. JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2013, 402 (01) : 378 - 391

← 1 2 3 →