Policy iteration for average cost Markov control processes on Borel spaces

被引：15

作者：

HernandezLerma, O ^{[1
]}

Lasserre, JB ^{[1
]}

机构：

[1] CNRS,LAAS,F-31077 TOULOUSE,FRANCE

来源：

ACTA APPLICANDAE MATHEMATICAE | 1997年 / 47卷 / 02期

关键词：

(discrete-time) Markov control processes; average cost; policy iteration (aka Howard's algorithm);

D O I：

10.1023/A:1005781013253

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

This paper studies the policy iteration algorithm (PIA) for average cost Markov control processes on Borel spaces. Two classes of MCPs are considered. One of them allows some restricted-growth unbounded cost functions and compact control constraint sets; the other one requires strictly unbounded costs and the control constraint sets may be non-compact. For each of these classes, the PIA yields, under suitable assumptions, the optimal (minimum) cost, an optimal stationary control policy, and a solution to the average cost optimality equation.

引用

页码：125 / 154

页数：30

共 28 条

[1]

[Anonymous], 1987, STOCH MODELS

[2]

[Anonymous], 1992, Stochastic Stability of Markov chains

[3] DISCRETE-TIME CONTROLLED MARKOV-PROCESSES WITH AVERAGE COST CRITERION - A SURVEY [J].

ARAPOSTATHIS, A ;

BORKAR, VS ;

FERNANDEZGAUCHERAND, E ;

GHOSH, MK ;

MARCUS, SI .

SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1993, 31 (02) :282-344

[4]

BENES VE, 1967, J APPL PROBAB, V5, P203

[5]

Billingsley P, 1968, CONVERGE PROBAB MEAS

[6] MULTICHAIN MARKOV RENEWAL PROGRAMS [J].

DENARDO, EV ;

FOX, BL .

SIAM JOURNAL ON APPLIED MATHEMATICS, 1968, 16 (03) :468-&

[7]

DUFLO M., 1990, Methodes Recursives Aleatoires

[8]

Dynkin E.B., 1979, Grundlehren der Mathematischen Wissenschaften, V235

[9]

GLYNN PW, IN PRESS ANN PROBAB

[10]

Gordienko E., 1995, APPL MATH, V23, P199

← 1 2 3 →