Approximation of average cost optimal policies for general Markov decision processes with unbounded costs

被引：0

作者：

Evgueni Gordienko

Raúl Montes-De-Oca

Adolfo Minjárez-Sosa

机构：

[1] Universidad Autónoma Metropolitana — Iztapalapa,Departamento de Matemáticas

[2] Universidad de Sonora,Departamento de Matemáticas

来源：

Mathematical Methods of Operations Research | 1997年 / 45卷

关键词：

Markov Decision Process; Average Cost Criterion; Value Iteration; Approximation of Optimal Policy; Geometrical Convergence;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

The aim of the paper is to show that Lyapunov-like ergodicity conditions on Markov decision processes with Borel state space and possibly unbounded cost provide the approximation of an average cost optimal policy by solvingn-stage optimization problems (n = 1, 2, ...). The used approach ensures the exponential rate of convergence. The approximation of this type would be useful to find adaptive procedures of control and to estimate stability of an optimal control under disturbances of the transition probability.

引用

页码：245 / 263

页数：18

共 50 条

[21] A note on optimality conditions for continuous-time Markov decision processes with average cost criterion
Guo, XP
Liu, K
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2001, 46 (12) : 1984 - 1989
[22] Efficient Policies for Stationary Possibilistic Markov Decision Processes
Ben Amor, Nahla
El Khalfi, Zeineb
Fargier, Helene
Sabaddin, Regis
SYMBOLIC AND QUANTITATIVE APPROACHES TO REASONING WITH UNCERTAINTY, ECSQARU 2017, 2017, 10369 : 306 - 317
[23] A note on deterministic approximation of discounted Markov decision processes
Cruz-Suarez, Hugo
Gordienko, Evgueni
Montes-de-Oca, Raul
APPLIED MATHEMATICS LETTERS, 2009, 22 (08) : 1252 - 1256
[24] Solving average cost Markov decision processes by means of a two-phase time aggregation algorithm
Arruda, E. F.
Fragoso, M. D.
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2015, 240 (03) : 697 - 705
[25] Average cost criterion induced by the regular utility function for continuous-time Markov decision processes
Qingda Wei
Xian Chen
Discrete Event Dynamic Systems, 2017, 27 : 501 - 524
[26] The convergence of value iteration in average cost Markov decision chains
Sennott, LI
OPERATIONS RESEARCH LETTERS, 1996, 19 (01) : 11 - 16
[27] BATCH POLICY LEARNING IN AVERAGE REWARD MARKOV DECISION PROCESSES
Liao, Peng
Qi, Zhengling
Wan, Runzhe
Klasnja, Predrag
Murphy, Susan A.
ANNALS OF STATISTICS, 2022, 50 (06): : 3364 - 3387
[28] RISK-SENSITIVE AVERAGE OPTIMALITY IN MARKOV DECISION PROCESSES
Sladky, Karel
KYBERNETIKA, 2018, 54 (06) : 1218 - 1230
[29] Optimal threshold probability in undiscounted Markov decision processes with a target set
Ohtsubo, Y
APPLIED MATHEMATICS AND COMPUTATION, 2004, 149 (02) : 519 - 532
[30] A Markov Decision Process to Determine Optimal Policies in Moving Target
Zheng, Jianjun
Namin, Akbar Siami
PROCEEDINGS OF THE 2018 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY (CCS'18), 2018, : 2321 - 2323

← 1 2 3 4 5 →