Optimal decisions for continuous time Markov decision processes over finite planning horizons

被引:5
作者
Buchholz, Peter [1 ]
Dohndorf, Iryna [1 ]
Scheftelowitsch, Dimitri [1 ]
机构
[1] Tech Univ Dortmund, Informat 4, D-44221 Dortmund, Germany
关键词
Continuous time Markov decision process; Finite horizon; Uniformization; Numerical techniques; Optimization; CHAINS; PROBABILITIES; OPTIMIZATION; ALGORITHMS; STATE;
D O I
10.1016/j.cor.2016.08.003
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The computation of epsilon-optimal policies for continuous time Markov decision processes (CTMDPs) over finite time intervals is a sophisticated problem because the optimal policy may change at arbitrary times. Numerical algorithms based on time discretization or uniformization have been proposed for the computation of optimal policies. The uniformization based algorithm has shown to be more reliable and often also more efficient but is currently only available for processes where the gain or reward does not depend on the decision taken in a state. In this paper, we present two new uniformization based algorithms for computing epsilon-optimal policies for CTMDPs with decision dependent rewards over a finite time horizon. Due to a new and tighter upper bound the newly proposed algorithms cannot only be applied for decision dependent rewards, they also outperform the available approach for rewards that do not depend on the decision. In particular for models where the policy only rarely changes, optimal policies can be computed much faster. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:267 / 278
页数:12
相关论文
共 24 条
[1]  
[Anonymous], 2007, DYNAMIC PROGRAMMING
[2]  
[Anonymous], 2014, Stochastic Processes: Theory for Applications
[3]  
[Anonymous], 1994, Introduction to the Numerical Solutions of Markov Chains
[4]  
[Anonymous], 2016, DYNAMIC PROGRAMMING
[5]   On the Numerical Analysis of Inhomogeneous Continuous-Time Markov Chains [J].
Arns, M. ;
Buchholz, P. ;
Panchenko, A. .
INFORMS JOURNAL ON COMPUTING, 2010, 22 (03) :416-432
[6]   Efficient computation of time-bounded reachability probabilities in uniform continuous-time Markov decision processes [J].
Baier, C ;
Hermanns, H ;
Katoen, JP ;
Haverkort, BR .
THEORETICAL COMPUTER SCIENCE, 2005, 345 (01) :2-26
[7]   Model-checking algorithms for continuous-time Markov chains [J].
Baier, C ;
Haverkort, B ;
Hermanns, H ;
Katoen, JP .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2003, 29 (06) :524-541
[8]   Simulation-Based Optimization Algorithms for Finite-Horizon Markov Decision Processes [J].
Bhatnagar, Shalabh ;
Abdulla, Mohammed Shahid .
SIMULATION-TRANSACTIONS OF THE SOCIETY FOR MODELING AND SIMULATION INTERNATIONAL, 2008, 84 (12) :577-600
[9]  
Buchholz Peter, 2011, Computer Aided Verification. Proceedings 23rd International Conference, CAV 2011, P225, DOI 10.1007/978-3-642-22110-1_19
[10]   Numerical analysis of continuous time Markov decision processes over finite horizons [J].
Buchholz, Peter ;
Schulz, Ingo .
COMPUTERS & OPERATIONS RESEARCH, 2011, 38 (03) :651-659