Value iteration and approximately optimal stationary policies in finite-state average Markov decision chains

被引:6
作者
Cavazos-Cadena, R [1 ]
机构
[1] Univ Anatoma Agraria Antonio Narro, Dept Estadistica Calculo, Saltillo 25315, Coahuila, Mexico
[2] Univ Autonoma Coahuila, Ctr Invest Socioecon, Saltillo 25315, Coahuila, Mexico
关键词
successive approximations; Markov decision processes; Schweitzer's transformation; optimality equation; convergence of the value iteration approximations;
D O I
10.1007/s001860200205
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
This work concerns finte-state Markov decision chains endowed with the long-run average reward criterion. Assuming that the optimality equation has a solution, it is shown that a nearly optimal stationary policy, as well as an approximation to the optimal average reward within a specified error, can be obtained in a finite number of steps of the value iteration method. These results extend others already available in the literature, which were established under more stringent restrictions on the ergodic structure of the decision process.
引用
收藏
页码:181 / 196
页数:16
相关论文
共 14 条
[1]   DISCRETE-TIME CONTROLLED MARKOV-PROCESSES WITH AVERAGE COST CRITERION - A SURVEY [J].
ARAPOSTATHIS, A ;
BORKAR, VS ;
FERNANDEZGAUCHERAND, E ;
GHOSH, MK ;
MARCUS, SI .
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1993, 31 (02) :282-344
[2]  
BORKAR VK, 1984, SIAM J CONTROL OPTIM, V21, P965
[3]   Adaptive control of average Markov decision chains under the Lyapunov stability condition [J].
Cavazos-Cadena, R .
MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2001, 54 (01) :63-99
[4]   Value iteration in a class of communicating Markov decision chains with the average cost criterion [J].
CavazosCadena, R .
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1996, 34 (06) :1848-1873
[5]   NONSTATIONARY MARKOV DECISION-PROBLEMS WITH CONVERGING PARAMETERS [J].
FEDERGRUEN, A ;
SCHWEITZER, PJ .
JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 1981, 34 (02) :207-241
[6]   OPTIMALITY EQUATION IN AVERAGE COST DENUMERABLE STATE SEMI-MARKOV DECISION PROBLEMS, RECURRENCY CONDITIONS AND ALGORITHMS [J].
FEDERGRUEN, A ;
TIJMS, HC .
JOURNAL OF APPLIED PROBABILITY, 1978, 15 (02) :356-373
[7]  
HERNANDEZLERMA O, 1988, ADAPTIVE MARKOV CONT
[8]  
HINDERER K, 1970, LECT NOTES OPERATION, V33
[9]  
Loeve M., 1977, Probability Theory, Vi
[10]  
Puterman M.L., 2008, Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley Series in Probability and Statistics