INFINITE HORIZON AVERAGE COST DYNAMIC PROGRAMMING SUBJECT TO TOTAL VARIATION DISTANCE AMBIGUITY

被引：10

作者：

Tzortzis, Ioannis ^{[1
]}

Charalambous, Charalambos D. ^{[1
]}

Charalambous, Themistoklis ^{[2
]}

机构：

[1] Univ Cyprus, Elect & Comp Engn, CY-1678 Nicosia, Cyprus

[2] Aalto Univ, Dept Elect Engn & Automat, Espoo 02150, Finland

来源：

SIAM JOURNAL ON CONTROL AND OPTIMIZATION | 2019年 / 57卷 / 04期

基金：

芬兰科学院;

关键词：

stochastic control; Markov control models; minimax; dynamic programming; average cost; infinite horizon; total variation distance; policy iteration; STOCHASTIC UNCERTAIN SYSTEMS; MINIMAX OPTIMAL-CONTROL; RISK-SENSITIVE CONTROL; MARKOV; CRITERION; GAMES;

D O I：

10.1137/18M1210514

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We analyze the per unit-time infinite horizon average cost Markov control model, subject to a total variation distance ambiguity on the controlled process conditional distribution. This stochastic optimal control problem is formulated as a minimax optimization problem in which the minimization is over the admissible set of control strategies, while the maximization is over the set of conditional distributions which are in a ball, with respect to the total variation distance, centered at a nominal distribution. We derive two new equivalent dynamic programming equations, and a new policy iteration algorithm. The main feature of the new dynamic programming equations is that the optimal control strategies are insensitive to inaccuracies or ambiguities in the controlled process conditional distribution. The main feature of the new policy iteration algorithm is that the policy evaluation and policy improvement steps are performed using the maximizing conditional distribution, which is obtained via a water filling solution of aggregating states together to form new states. Throughout the paper, we illustrate the new dynamic programming equations and the corresponding policy iteration algorithm to various examples.

引用

页码：2843 / 2872

页数：30

共 30 条

[1]

[Anonymous], 1986, Stochastic systems: Estimation, identification, and adaptive control

[2] DISCRETE-TIME CONTROLLED MARKOV-PROCESSES WITH AVERAGE COST CRITERION - A SURVEY [J].

ARAPOSTATHIS, A ;

BORKAR, VS ;

FERNANDEZGAUCHERAND, E ;

GHOSH, MK ;

MARCUS, SI .

SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1993, 31 (02) :282-344

[3]

Baras JS, 2005, IEEE DECIS CONTR P, P1043

[4]

Basar T., 1995, H Optimal Control and Related MinimaxDesign Problems: A Dynamic Game Approach

[5] A FINITE-DIMENSIONAL RISK-SENSITIVE CONTROL PROBLEM [J].

BENSOUSSAN, A ;

ELLIOTT, RJ .

SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1995, 33 (06) :1834-1846

[6]

Bertsekas Dimitri P, 1976, Dynamic programming and stochastic control

[7] CONTROL OF MARKOV-CHAINS WITH LONG-RUN AVERAGE COST CRITERION - THE DYNAMIC-PROGRAMMING EQUATIONS [J].

BORKAR, VS .

SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1989, 27 (03) :642-657

[8] ON MINIMUM COST PER UNIT TIME CONTROL OF MARKOV-CHAINS [J].

BORKAR, VS .

SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1984, 22 (06) :965-978

[9]

Caines P. E., 1988, LINEAR STOCHASTIC SY

[10]

Charalambous C.D., 1996, Stochastics, V57, P247

← 1 2 3 →