Stochastic control via direct comparison

被引:9
作者
Cao, Xi-Ren [1 ]
Wang, De-Xin [2 ]
Lu, Tao
Xu, Yifan [3 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai 200030, Peoples R China
[2] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong, Hong Kong, Peoples R China
[3] Fudan Univ, Sch Management, Shanghai 200433, Peoples R China
来源
DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS | 2011年 / 21卷 / 01期
关键词
Dynamic programming; Markov decision processes; HJB equation; Performance potentials; Poisson equation; Perturbation analysis; Sensitivity-based optimization; MARKOV DECISION-PROCESSES; POLICY ITERATION; OPTIMALITY;
D O I
10.1007/s10626-010-0093-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The standard approach to stochastic control is dynamic programming. In this paper, we introduce an alternative approach based on direct comparison of the performance of any two policies. This is achieved by modeling the state process as a continuous-time and continuous-state Markov process and applying the same ideas as for the discrete-time and discrete-state case. This approach is simple and intuitively clear; it applies to different problems with, finite and infinite horizons, discounted and long-run-average performance, continuous and jump diffusions, in the same way. Discounting is not needed when dealing with long-run average performance. The approach provides a unified framework for stochastic control and other optimization theory and methodologies, including Markov decision processes, perturbation analysis, and reinforcement learning.
引用
收藏
页码:11 / 38
页数:28
相关论文
共 37 条
  • [1] [Anonymous], 2007, DYNAMIC PROGRAMMING
  • [2] [Anonymous], 2007, DYNAMIC PROGRAMMING
  • [3] [Anonymous], 1996, Neuro-dynamic programming
  • [4] [Anonymous], 2009, MARKOV CHAINS STOCHA
  • [5] [Anonymous], 1979, Wiley Series in Probability and Mathematical Statistics
  • [6] [Anonymous], 2007, Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)
  • [7] BROCKETT R, 2009, LECT NOTES HARVARD U
  • [8] The nth-order bias optimality for multichain Markov decision processes
    Cao, Xi-Ren
    Zhang, Junyu
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2008, 53 (02) : 496 - 508
  • [9] A New Model of Continuous-Time Markov Processes and Impulse Stochastic Control
    Cao, Xi-Ren
    [J]. PROCEEDINGS OF THE 48TH IEEE CONFERENCE ON DECISION AND CONTROL, 2009 HELD JOINTLY WITH THE 2009 28TH CHINESE CONTROL CONFERENCE (CDC/CCC 2009), 2009, : 525 - 530
  • [10] Cao Xi- Ren, 2007, STOCHASTIC LEARNING