Approximation of average cost optimal policies for general Markov decision processes with unbounded costs

被引:0
|
作者
Evgueni Gordienko
Raúl Montes-De-Oca
Adolfo Minjárez-Sosa
机构
[1] Universidad Autónoma Metropolitana — Iztapalapa,Departamento de Matemáticas
[2] Universidad de Sonora,Departamento de Matemáticas
来源
Mathematical Methods of Operations Research | 1997年 / 45卷
关键词
Markov Decision Process; Average Cost Criterion; Value Iteration; Approximation of Optimal Policy; Geometrical Convergence;
D O I
暂无
中图分类号
学科分类号
摘要
The aim of the paper is to show that Lyapunov-like ergodicity conditions on Markov decision processes with Borel state space and possibly unbounded cost provide the approximation of an average cost optimal policy by solvingn-stage optimization problems (n = 1, 2, ...). The used approach ensures the exponential rate of convergence. The approximation of this type would be useful to find adaptive procedures of control and to estimate stability of an optimal control under disturbances of the transition probability.
引用
收藏
页码:245 / 263
页数:18
相关论文
共 50 条
  • [21] A note on optimality conditions for continuous-time Markov decision processes with average cost criterion
    Guo, XP
    Liu, K
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2001, 46 (12) : 1984 - 1989
  • [22] Efficient Policies for Stationary Possibilistic Markov Decision Processes
    Ben Amor, Nahla
    El Khalfi, Zeineb
    Fargier, Helene
    Sabaddin, Regis
    SYMBOLIC AND QUANTITATIVE APPROACHES TO REASONING WITH UNCERTAINTY, ECSQARU 2017, 2017, 10369 : 306 - 317
  • [23] A note on deterministic approximation of discounted Markov decision processes
    Cruz-Suarez, Hugo
    Gordienko, Evgueni
    Montes-de-Oca, Raul
    APPLIED MATHEMATICS LETTERS, 2009, 22 (08) : 1252 - 1256
  • [24] Solving average cost Markov decision processes by means of a two-phase time aggregation algorithm
    Arruda, E. F.
    Fragoso, M. D.
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2015, 240 (03) : 697 - 705
  • [25] Average cost criterion induced by the regular utility function for continuous-time Markov decision processes
    Qingda Wei
    Xian Chen
    Discrete Event Dynamic Systems, 2017, 27 : 501 - 524
  • [26] The convergence of value iteration in average cost Markov decision chains
    Sennott, LI
    OPERATIONS RESEARCH LETTERS, 1996, 19 (01) : 11 - 16
  • [27] BATCH POLICY LEARNING IN AVERAGE REWARD MARKOV DECISION PROCESSES
    Liao, Peng
    Qi, Zhengling
    Wan, Runzhe
    Klasnja, Predrag
    Murphy, Susan A.
    ANNALS OF STATISTICS, 2022, 50 (06): : 3364 - 3387
  • [28] RISK-SENSITIVE AVERAGE OPTIMALITY IN MARKOV DECISION PROCESSES
    Sladky, Karel
    KYBERNETIKA, 2018, 54 (06) : 1218 - 1230
  • [30] A Markov Decision Process to Determine Optimal Policies in Moving Target
    Zheng, Jianjun
    Namin, Akbar Siami
    PROCEEDINGS OF THE 2018 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY (CCS'18), 2018, : 2321 - 2323