The Transformation Method for Continuous-Time Markov Decision Processes

被引:0
作者
Alexey Piunovskiy
Yi Zhang
机构
[1] University of Liverpool,Department of Mathematical Sciences
来源
Journal of Optimization Theory and Applications | 2012年 / 154卷
关键词
Discrete-time Markov decision process; Continuous-time Markov decision process; Unbounded transition rates; Transformation method; History-dependent policies;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, we show that a discounted continuous-time Markov decision process in Borel spaces with randomized history-dependent policies, arbitrarily unbounded transition rates and a non-negative reward rate is equivalent to a discrete-time Markov decision process. Based on a completely new proof, which does not involve Kolmogorov’s forward equation, it is shown that the value function for both models is given by the minimal non-negative solution to the same Bellman equation. A verifiable necessary and sufficient condition for the finiteness of this value function is given, which induces a new condition for the non-explosion of the underlying controlled process.
引用
收藏
页码:691 / 712
页数:21
相关论文
共 28 条
[1]  
Guo X.(2006)A survey of recent results on continuous-time Markov decision processes Top 14 177-257
[2]  
Hernández-Lerma O.(2011)Discounted continuous-time Markov decision processes with constraints: unbounded transition and loss rates Math. Oper. Res. 36 105-132
[3]  
Prieto-Rumeau T.(2011)Discounted continuous-time Markov decision processes with unbounded rates: the convex analytic approach SIAM J. Control Optim. 49 2032-2061
[4]  
Guo X.(2004)Continuous time discounted jump Markov decision processes: a discrete-event approach Math. Oper. Res. 29 492-524
[5]  
Piunovskiy A.(1979)An equivalence between continuous and discrete time Markov decision processes Oper. Res. 27 616-620
[6]  
Piunovskiy A.(1990)CTMDP and its relationship with DTMDP Chin. Sci. Bull. 35 710-714
[7]  
Zhang Y.(2002)Denumerable-state continuous-time Markov decision processes with unbounded transition and reward rates under the discounted criterion J. Appl. Probab. 39 233-250
[8]  
Feinberg E.(2007)Continuous-time Markov decision processes with discounted rewards: the case of Polish spaces Math. Oper. Res. 32 73-87
[9]  
Serfozo F.(1986)Semi-Markov and jump Markov controlled models: average cost criterion Theory Probab. Appl. 30 272-288
[10]  
Hu Q.(1975)Multivariate point processes: predictable projection, Radon-Nykodym derivatives, representation of martingales Z. Wahrscheinlichkeitstheorie Verw. Gebite. 31 235-253