Approximate receding horizon approach for Markov decision processes: average reward case

被引：20

作者：

Chang, HS

Marcus, SI ^{[1
]}

机构：

[1] Univ Maryland, Dept Elect & Comp Engn, College Pk, MD 20742 USA

[2] Sogang Univ, Dept Comp Sci & Engn, Seoul, South Korea

来源：

JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS | 2003年 / 286卷 / 02期

关键词：

Markov decision process; receding horizon control; infinite-horizon average reward; policy improvement; rollout; ergodicity;

D O I：

10.1016/S0022-247X(03)00506-7

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

We consider an approximation scheme for solving Markov decision processes (MDPs) with countable state space, finite action space, and bounded rewards that uses an approximate solution of a fixed finite-horizon sub-MDP of a given infinite-horizon MDP to create a stationary policy, which we call "approximate receding horizon control." We first analyze the performance of the approximate receding horizon control for infinite-horizon average reward under an ergodicity assumption, which also generalizes the result obtained by White (J. Oper. Res. Soc. 33 (1982) 253-259). We then study two examples of the approximate receding horizon control via lower bounds to the exact solution to the sub-MDP. The first control policy is based on a finite-horizon approximation of Howard's policy improvement of a single policy and the second policy is based on a generalization of the single policy improvement for multiple policies. Along the study, we also provide a simple alternative proof on the policy improvement for countable state space. We finally discuss practical implementations of these schemes via simulation. (C) 2003 Elsevier Inc. All rights reserved.

引用

页码：636 / 651

页数：16

共 50 条

[21] MARKOV DECISION PROCESSES WITH TIME-VARYING DISCOUNT FACTORS AND RANDOM HORIZON [J].

Ilhuicatzi-Roldan, Rocio ;

Cruz-Suarez, Hugo ;

Chavez-Rodriguez, Selene .

KYBERNETIKA, 2017, 53 (01) :82-98

[22] Markov-type fuzzy decision processes with a discounted reward on a closed interval [J].

Kurano, M ;

Yasuda, M ;

Nakagami, J ;

Yoshida, Y .

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 1996, 92 (03) :649-662

[23] Algorithm to identify and compute average optimal policies in multichain Markov decision processes [J].

Leizarowitz, A .

MATHEMATICS OF OPERATIONS RESEARCH, 2003, 28 (03) :553-586

[24] Mean-Variance Problems for Finite Horizon Semi-Markov Decision Processes [J].

Yonghui Huang ;

Xianping Guo .

Applied Mathematics & Optimization, 2015, 72 :233-259

[25] Mean-Variance Problems for Finite Horizon Semi-Markov Decision Processes [J].

Huang, Yonghui ;

Guo, Xianping .

APPLIED MATHEMATICS AND OPTIMIZATION, 2015, 72 (02) :233-259

[26] Threshold probability of non-terminal type in finite horizon Markov decision processes [J].

Kira, Akifumi ;

Ueno, Takayuki ;

Fujita, Toshiharu .

JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2012, 386 (01) :461-472

[27] AN EXTENDED VERSION OF AVERAGE MARKOV DECISION PROCESSES ON DISCRETE SPACES UNDER FUZZY ENVIRONMENT [J].

Cruz-Suarez, Hugo ;

Montes-De-Oca, Raul ;

Ortega-Gutierrez, R. Israel .

KYBERNETIKA, 2023, 59 (01) :160-178

[28] Approximation of average cost optimal policies for general Markov decision processes with unbounded costs [J].

Gordienko, E ;

Montes-de-Oca, R ;

Minjarez-Sosa, A .

MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 1997, 45 (02) :245-263

[29] A note on the existence of optimal stationary policies for average Markov decision processes with countable states [J].

Xia, Li ;

Guo, Xianping ;

Cao, Xi-Ren .

AUTOMATICA, 2023, 151

[30] Optimality inequalities for average cost Markov decision processes and the stochastic cash balance problem [J].

Feinberg, Eugene A. ;

Lewis, Mark E. .

MATHEMATICS OF OPERATIONS RESEARCH, 2007, 32 (04) :769-783

← 1 2 3 4 5 →