Policy iteration for average cost Markov control processes on Borel spaces

被引:15
作者
HernandezLerma, O [1 ]
Lasserre, JB [1 ]
机构
[1] CNRS,LAAS,F-31077 TOULOUSE,FRANCE
关键词
(discrete-time) Markov control processes; average cost; policy iteration (aka Howard's algorithm);
D O I
10.1023/A:1005781013253
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
This paper studies the policy iteration algorithm (PIA) for average cost Markov control processes on Borel spaces. Two classes of MCPs are considered. One of them allows some restricted-growth unbounded cost functions and compact control constraint sets; the other one requires strictly unbounded costs and the control constraint sets may be non-compact. For each of these classes, the PIA yields, under suitable assumptions, the optimal (minimum) cost, an optimal stationary control policy, and a solution to the average cost optimality equation.
引用
收藏
页码:125 / 154
页数:30
相关论文
共 28 条
[1]  
[Anonymous], 1987, STOCH MODELS
[2]  
[Anonymous], 1992, Stochastic Stability of Markov chains
[3]   DISCRETE-TIME CONTROLLED MARKOV-PROCESSES WITH AVERAGE COST CRITERION - A SURVEY [J].
ARAPOSTATHIS, A ;
BORKAR, VS ;
FERNANDEZGAUCHERAND, E ;
GHOSH, MK ;
MARCUS, SI .
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1993, 31 (02) :282-344
[4]  
BENES VE, 1967, J APPL PROBAB, V5, P203
[5]  
Billingsley P, 1968, CONVERGE PROBAB MEAS
[6]   MULTICHAIN MARKOV RENEWAL PROGRAMS [J].
DENARDO, EV ;
FOX, BL .
SIAM JOURNAL ON APPLIED MATHEMATICS, 1968, 16 (03) :468-&
[7]  
DUFLO M., 1990, Methodes Recursives Aleatoires
[8]  
Dynkin E.B., 1979, Grundlehren der Mathematischen Wissenschaften, V235
[9]  
GLYNN PW, IN PRESS ANN PROBAB
[10]  
Gordienko E., 1995, APPL MATH, V23, P199