Approximate receding horizon approach for Markov decision processes: average reward case

被引:20
作者
Chang, HS
Marcus, SI [1 ]
机构
[1] Univ Maryland, Dept Elect & Comp Engn, College Pk, MD 20742 USA
[2] Sogang Univ, Dept Comp Sci & Engn, Seoul, South Korea
关键词
Markov decision process; receding horizon control; infinite-horizon average reward; policy improvement; rollout; ergodicity;
D O I
10.1016/S0022-247X(03)00506-7
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
We consider an approximation scheme for solving Markov decision processes (MDPs) with countable state space, finite action space, and bounded rewards that uses an approximate solution of a fixed finite-horizon sub-MDP of a given infinite-horizon MDP to create a stationary policy, which we call "approximate receding horizon control." We first analyze the performance of the approximate receding horizon control for infinite-horizon average reward under an ergodicity assumption, which also generalizes the result obtained by White (J. Oper. Res. Soc. 33 (1982) 253-259). We then study two examples of the approximate receding horizon control via lower bounds to the exact solution to the sub-MDP. The first control policy is based on a finite-horizon approximation of Howard's policy improvement of a single policy and the second policy is based on a generalization of the single policy improvement for multiple policies. Along the study, we also provide a simple alternative proof on the policy improvement for countable state space. We finally discuss practical implementations of these schemes via simulation. (C) 2003 Elsevier Inc. All rights reserved.
引用
收藏
页码:636 / 651
页数:16
相关论文
共 50 条
[21]   MARKOV DECISION PROCESSES WITH TIME-VARYING DISCOUNT FACTORS AND RANDOM HORIZON [J].
Ilhuicatzi-Roldan, Rocio ;
Cruz-Suarez, Hugo ;
Chavez-Rodriguez, Selene .
KYBERNETIKA, 2017, 53 (01) :82-98
[22]   Markov-type fuzzy decision processes with a discounted reward on a closed interval [J].
Kurano, M ;
Yasuda, M ;
Nakagami, J ;
Yoshida, Y .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 1996, 92 (03) :649-662
[23]   Algorithm to identify and compute average optimal policies in multichain Markov decision processes [J].
Leizarowitz, A .
MATHEMATICS OF OPERATIONS RESEARCH, 2003, 28 (03) :553-586
[24]   Mean-Variance Problems for Finite Horizon Semi-Markov Decision Processes [J].
Yonghui Huang ;
Xianping Guo .
Applied Mathematics & Optimization, 2015, 72 :233-259
[25]   Mean-Variance Problems for Finite Horizon Semi-Markov Decision Processes [J].
Huang, Yonghui ;
Guo, Xianping .
APPLIED MATHEMATICS AND OPTIMIZATION, 2015, 72 (02) :233-259
[26]   Threshold probability of non-terminal type in finite horizon Markov decision processes [J].
Kira, Akifumi ;
Ueno, Takayuki ;
Fujita, Toshiharu .
JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2012, 386 (01) :461-472
[27]   AN EXTENDED VERSION OF AVERAGE MARKOV DECISION PROCESSES ON DISCRETE SPACES UNDER FUZZY ENVIRONMENT [J].
Cruz-Suarez, Hugo ;
Montes-De-Oca, Raul ;
Ortega-Gutierrez, R. Israel .
KYBERNETIKA, 2023, 59 (01) :160-178
[28]   Approximation of average cost optimal policies for general Markov decision processes with unbounded costs [J].
Gordienko, E ;
Montes-de-Oca, R ;
Minjarez-Sosa, A .
MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 1997, 45 (02) :245-263
[29]   A note on the existence of optimal stationary policies for average Markov decision processes with countable states [J].
Xia, Li ;
Guo, Xianping ;
Cao, Xi-Ren .
AUTOMATICA, 2023, 151
[30]   Optimality inequalities for average cost Markov decision processes and the stochastic cash balance problem [J].
Feinberg, Eugene A. ;
Lewis, Mark E. .
MATHEMATICS OF OPERATIONS RESEARCH, 2007, 32 (04) :769-783