A Version of the Euler Equation in Discounted Markov Decision Processes

被引：2

作者：

Cruz-Suarez, H. ^{[1
]}

Zacarias-Espinoza, G. ^{[1
]}

Vazquez-Guevara, V. ^{[1
]}

机构：

[1] Benemerita Univ Autonoma Puebla, Fac Ciencias Fis Matemat, CU, Puebla 72570, PUE, Mexico

来源：

JOURNAL OF APPLIED MATHEMATICS | 2012年

关键词：

UNCERTAINTY; GROWTH;

D O I：

10.1155/2012/103698

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

This paper deals with Markov decision processes (MDPs) on Euclidean spaces with an infinite horizon. An approach to study this kind of MDPs is using the dynamic programming technique (DP). Then the optimal value function is characterized through the value iteration functions. The paper provides conditions that guarantee the convergence of maximizers of the value iteration functions to the optimal policy. Then, using the Euler equation and an envelope formula, the optimal solution of the optimal control problem is obtained. Finally, this theory is applied to a linear-quadratic control problem in order to find its optimal policy.

引用

页数：16

共 21 条

[1] [Anonymous], 1989, RECURSIVE METHODS EC, DOI DOI 10.2307/J.CTVJNRT76
[2] [Anonymous], CONTROLLED MARKOV PR
[3] A note on uncertainty and discounting in models of economic growth
Arrow, Kenneth J.
[J]. JOURNAL OF RISK AND UNCERTAINTY, 2009, 38 (02) : 87 - 94
[4] BENVENISTE LM, 1979, ECONOMETRICA, V47, P726
[5] Bertsekas Dimitri P., 2018, Abstract Dynamic Programming, V2nd
[6] OPTIMAL ECONOMIC GROWTH AND UNCERTAINTY - DISCOUNTED CASE
BROCK, WA
MIRMAN, LJ
[J]. JOURNAL OF ECONOMIC THEORY, 1972, 4 (03) : 479 - 513
[7] Cao Xi- Ren, 2007, STOCHASTIC LEARNING
[8] Conditions for the uniqueness of optimal policies of discounted Markov decision processes
Cruz-Suárez, D
Montes-de-Oca, R
Salem-Silva, F
[J]. MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2004, 60 (03) : 415 - 436
[9] An envelope theorem and some applications to discounted Markov decision processes
Cruz-Suarez, Hugo
Montes-de-Oca, Raul
[J]. MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2008, 67 (02) : 299 - 321
[10] Cruz-Suárez H, 2006, KYBERNETIKA, V42, P647

← 1 2 3 →