The optimal probability of the risk for finite horizon partially observable Markov decision processes

被引:0
作者
Wen, Xian [1 ]
Huo, Haifeng [1 ]
Cui, Jinhua [1 ]
机构
[1] Guangxi Univ Sci & Technol, Sch Sci, Liuzhou 541006, Peoples R China
来源
AIMS MATHEMATICS | 2023年 / 8卷 / 12期
基金
中国国家自然科学基金;
关键词
partially observable Markov decision processes; risk probability criterion; Bayes operator; the optimal policy; MINIMIZATION; MODELS; VARIANCE;
D O I
10.3934/math.20231455
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
This paper investigates the optimality of the risk probability for finite horizon partially observable discrete-time Markov decision processes (POMDPs). The probability of the risk is optimized based on the criterion of total rewards not exceeding the preset goal value, which is different from the optimal problem of expected rewards. Based on the Bayes operator and the filter equations, the optimization problem of risk probability can be equivalently reformulated as filtered Markov decision processes. As an advantage of developing the value iteration technique, the optimality equation satisfied by the value function is established and the existence of the risk probability optimal policy is proven. Finally, an example is given to illustrate the effectiveness of using the value iteration algorithm to compute the value function and optimal policy.
引用
收藏
页码:28435 / 28449
页数:15
相关论文
共 25 条
  • [1] [Anonymous], 1970, FDN NONSTATIONARY DY
  • [2] Bäuerle N, 2011, UNIVERSITEXT, P1, DOI 10.1007/978-3-642-18324-9
  • [3] Bertsekas D. P., 1996, Stochastic optimal control: the discrete-time case, V5
  • [4] Drake A, 1962, Observation of a Markov process through a noisy channel
  • [5] Partially Observable Total-Cost Markov Decision Processes with Weakly Continuous Transition Probabilities
    Feinberg, Eugene A.
    Kasyanov, Pavlo O.
    Zgurovsky, Michael Z.
    [J]. MATHEMATICS OF OPERATIONS RESEARCH, 2016, 41 (02) : 656 - 681
  • [6] Guo X, 2009, STOCH MOD APPL PROBA, V62, P1, DOI 10.1007/978-3-642-02547-1
  • [7] Guided Soft Actor Critic: A Guided Deep Reinforcement Learning Approach for Partially Observable Markov Decision Processes
    Haklidir, Mehmet
    Temeltas, Hakan
    [J]. IEEE ACCESS, 2021, 9 : 159672 - 159683
  • [8] Hernandez-Lerma O., 1989, ADAPTIVE MARKOV CONT, V79
  • [9] A minimization problem of the risk probability in first passage semi-Markov decision processes with loss rates
    Huang XiangXiang
    Zou XiaoLong
    Guo XianPing
    [J]. SCIENCE CHINA-MATHEMATICS, 2015, 58 (09) : 1923 - 1938
  • [10] Minimum risk probability for finite horizon semi-Markov decision processes
    Huang, Yonghui
    Guo, Xianping
    Li, Zhongfei
    [J]. JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2013, 402 (01) : 378 - 391