The risk probability criterion for discounted continuous-time Markov decision processes

被引:17
|
作者
Huo, Haifeng [1 ]
Zou, Xiaolong [2 ]
Guo, Xianping [1 ]
机构
[1] Sun Yat Sen Univ, Sch Math, Guangzhou 510275, Guangdong, Peoples R China
[2] Guang Zhou Univ, Sch Econ & Stat, Guangzhou 510006, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Continuous-time Markov decision processes; Risk probability criterion; Value iteration; Optimality equation; Optimal policy; MODELS; RATES; OPTIMALITY; POLICIES;
D O I
10.1007/s10626-017-0257-6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we consider the risk probability minimization problem for infinite discounted continuous-time Markov decision processes (CTMDPs) with unbounded transition rates. First, we introduce a class of policies depending on histories with the additional reward levels. Then, we construct the corresponding probability spaces, and also establish the non-explosion of the state process. Secondly, under suitable conditions we prove that the value function is a solution to the optimality equation for the probability criterion by an iteration technique, and obtain a value iteration algorithm to compute (at least approximate) the value function. Furthermore, under an additional condition we establish the uniqueness of the solution to the optimality equation and the existence of an optimal policy. Finally, we illustrate our results with two examples. The first one is used to verify our conditions for CTMDPs with unbounded transition rates, the second one for the numerical calculation of the value function and an optimal policy.
引用
收藏
页码:675 / 699
页数:25
相关论文
共 50 条