On structural properties of optimal average cost functions in Markov decision processes with Borel spaces and universally measurable policies

被引：1

作者：

Yu, Huizhen ^{[1
]}

机构：

[1] Univ Alberta, Dept Comp Sci, Edmonton, AB, Canada

来源：

JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS | 2022年 / 509卷 / 01期

关键词：

Markov decision processes; Universally measurable policies; Average cost; Submartingales; Reachability; Recurrent Markov chains; MINIMUM PAIR; EQUATION; STATE; CONVERGENCE; EXISTENCE; CHAINS;

D O I：

10.1016/j.jmaa.2021.125954

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

We consider Markov decision processes (MDPs) with Borel state and action spaces and universally measurable policies. For several long-run average cost criteria and two classes of MDPs, we prove sufficient conditions for the optimal average cost functions to be constant almost everywhere with respect to certain sigma-finite measures. Besides suitable boundedness conditions on the positive parts of the one-stage costs, the key condition here is that each subset of states with positive measure be reachable with probability one under some policy. Our proofs exploit an inequality for the optimal average cost functions and its connection with submartingales, and, in a special case that involves stationary policies, also use the theory of recurrent Markov chains. (c) 2021 Elsevier Inc. All rights reserved.

引用

页数：23

共 49 条

[41] Finite-State Approximations to Discounted and Average Cost Constrained Markov Decision Processes
Saldi, Naci
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2019, 64 (07) : 2681 - 2696
[42] Optimality inequalities for average cost Markov decision processes and the stochastic cash balance problem
Feinberg, Eugene A.
Lewis, Mark E.
MATHEMATICS OF OPERATIONS RESEARCH, 2007, 32 (04) : 769 - 783
[43] Some advances on constrained Markov decision processes in Borel spaces with random state-dependent discount factors
Jasso-Fuentes, Hector
Lopez-Martinez, Raquiel R.
Adolfo Minjarez-Sosa, J.
OPTIMIZATION, 2024, 73 (04) : 925 - 951
[44] Value iteration and approximately optimal stationary policies in finite-state average Markov decision chains
Cavazos-Cadena, R
MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2002, 56 (02) : 181 - 196
[45] SAMPLE-PATH OPTIMAL STATIONARY POLICIES IN STABLE MARKOV DECISION CHAINS WITH THE AVERAGE REWARD CRITERION
Cavazos-Cadena, Rolando
Montes-De-Oca, Raul
Sladky, Karel
JOURNAL OF APPLIED PROBABILITY, 2015, 52 (02) : 419 - 440
[46] Average sample-path optimality for continuous-time Markov decision processes in Polish spaces
Zhu, Quan-xin
ACTA MATHEMATICAE APPLICATAE SINICA-ENGLISH SERIES, 2011, 27 (04): : 613 - 624
[47] Solving average cost Markov decision processes by means of a two-phase time aggregation algorithm
Arruda, E. F.
Fragoso, M. D.
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2015, 240 (03) : 697 - 705
[48] Solutions of the average cost optimality equation for Markov decision processes with weakly continuous kernel: The fixed-point approach revisited
Vega-Amaya, Oscar
JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2018, 464 (01) : 152 - 163
[49] LINEAR PROGRAMMING AND CONSTRAINED AVERAGE OPTIMALITY FOR GENERAL CONTINUOUS-TIME MARKOV DECISION PROCESSES IN HISTORY-DEPENDENT POLICIES
Guo, Xianping
Huang, Yonghui
Song, Xinyuan
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2012, 50 (01) : 23 - 47

← 1 2 3 4 5 →