We consider Markov decision processes (MDPs) with Borel state and action spaces and universally measurable policies. For several long-run average cost criteria and two classes of MDPs, we prove sufficient conditions for the optimal average cost functions to be constant almost everywhere with respect to certain sigma-finite measures. Besides suitable boundedness conditions on the positive parts of the one-stage costs, the key condition here is that each subset of states with positive measure be reachable with probability one under some policy. Our proofs exploit an inequality for the optimal average cost functions and its connection with submartingales, and, in a special case that involves stationary policies, also use the theory of recurrent Markov chains. (c) 2021 Elsevier Inc. All rights reserved.
机构:
Univ Nacl Autonoma Mexico, Inst Invest Matemat Aplicadas & Sistemas, Dept Probabilidad & Estadist, Mexico City 01000, DF, MexicoUniv Autonoma Nuevo Leon, Fac Ingn Mecan & Elect, San Nicolas De Los Garza 66450, NL, Mexico
Gonzalez-Hernandez, Juan
Villarreal, Cesar E.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Autonoma Nuevo Leon, Fac Ingn Mecan & Elect, San Nicolas De Los Garza 66450, NL, MexicoUniv Autonoma Nuevo Leon, Fac Ingn Mecan & Elect, San Nicolas De Los Garza 66450, NL, Mexico
机构:Universidad Nacional Autónoma de México,Departamento de Probabilidad y Estadística, Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas
Juan González-Hernández
César E. Villarreal
论文数: 0引用数: 0
h-index: 0
机构:Universidad Nacional Autónoma de México,Departamento de Probabilidad y Estadística, Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas