Value iteration and approximately optimal stationary policies in finite-state average Markov decision chains

被引：6

作者：

Cavazos-Cadena, R ^{[1
]}

机构：

[1] Univ Anatoma Agraria Antonio Narro, Dept Estadistica Calculo, Saltillo 25315, Coahuila, Mexico

[2] Univ Autonoma Coahuila, Ctr Invest Socioecon, Saltillo 25315, Coahuila, Mexico

来源：

MATHEMATICAL METHODS OF OPERATIONS RESEARCH | 2002年 / 56卷 / 02期

关键词：

successive approximations; Markov decision processes; Schweitzer's transformation; optimality equation; convergence of the value iteration approximations;

D O I：

10.1007/s001860200205

中图分类号：

C93 [管理学]; O22 [运筹学];

学科分类号：

070105 ; 12 ; 1201 ; 1202 ; 120202 ;

摘要：

This work concerns finte-state Markov decision chains endowed with the long-run average reward criterion. Assuming that the optimality equation has a solution, it is shown that a nearly optimal stationary policy, as well as an approximation to the optimal average reward within a specified error, can be obtained in a finite number of steps of the value iteration method. These results extend others already available in the literature, which were established under more stringent restrictions on the ergodic structure of the decision process.

引用

页码：181 / 196

页数：16

共 14 条

[1] DISCRETE-TIME CONTROLLED MARKOV-PROCESSES WITH AVERAGE COST CRITERION - A SURVEY [J].