The risk probability criterion for discounted continuous-time Markov decision processes

被引：17

作者：

Huo, Haifeng ^{[1
]}

Zou, Xiaolong ^{[2
]}

Guo, Xianping ^{[1
]}

机构：

[1] Sun Yat Sen Univ, Sch Math, Guangzhou 510275, Guangdong, Peoples R China

[2] Guang Zhou Univ, Sch Econ & Stat, Guangzhou 510006, Guangdong, Peoples R China

来源：

DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS | 2017年 / 27卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Continuous-time Markov decision processes; Risk probability criterion; Value iteration; Optimality equation; Optimal policy; MODELS; RATES; OPTIMALITY; POLICIES;

D O I：

10.1007/s10626-017-0257-6

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we consider the risk probability minimization problem for infinite discounted continuous-time Markov decision processes (CTMDPs) with unbounded transition rates. First, we introduce a class of policies depending on histories with the additional reward levels. Then, we construct the corresponding probability spaces, and also establish the non-explosion of the state process. Secondly, under suitable conditions we prove that the value function is a solution to the optimality equation for the probability criterion by an iteration technique, and obtain a value iteration algorithm to compute (at least approximate) the value function. Furthermore, under an additional condition we establish the uniqueness of the solution to the optimality equation and the existence of an optimal policy. Finally, we illustrate our results with two examples. The first one is used to verify our conditions for CTMDPs with unbounded transition rates, the second one for the numerical calculation of the value function and an optimal policy.

引用

页码：675 / 699

页数：25

共 33 条

[1] Anderson W., 1991, CONTINUOUS TIME MARK, DOI 10.1007/978-1-4612-3038-0
[2] Bäuerle N, 2011, UNIVERSITEXT, P1, DOI 10.1007/978-3-642-18324-9
[3] BERTSEKAS D. P., 1996, Stochastic optimal control: the discrete-time case
[4] TARGET-LEVEL CRITERION IN MARKOV DECISION-PROCESSES
BOUAKIZ, M
KEBIR, Y
[J]. JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 1995, 86 (01) : 1 - 15
[5] Stochastic control via direct comparison
Cao, Xi-Ren
Wang, De-Xin
Lu, Tao
Xu, Yifan
[J]. DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS, 2011, 21 (01): : 11 - 38
[6] Cao Xi- Ren, 2007, STOCHASTIC LEARNING
[7] A unified approach to Markov decision problems and performance sensitivity analysis with discounted and average criteria: multichain cases
Cao, XR
Guo, XP
[J]. AUTOMATICA, 2004, 40 (10) : 1749 - 1759
[8] Chung K.L., 1967, Markov Chains with Stationary Transition Probabilities, Vsecond
[9] Feinberg E., 2012, Optimization, Control, and Applications of Stochastic Systems, P77
[10] Continuous-time Markov decision processes with discounted rewards: The case of Polish spaces
Guo, Xianping
[J]. MATHEMATICS OF OPERATIONS RESEARCH, 2007, 32 (01) : 73 - 87

← 1 2 3 4 →