STABLE OPTIMAL CONTROL AND SEMICONTRACTIVE DYNAMIC PROGRAMMING

被引:9
作者
Bertsekas, Dimitri P. [1 ,2 ]
机构
[1] MIT, Dept Elect Engn & Comp Sci, Cambridge, MA 02139 USA
[2] MIT, Lab Informat & Decis Syst, Cambridge, MA 02139 USA
关键词
stable policy; dynamic programming; shortest path; value iteration; policy iteration; discrete-time optimal control;
D O I
10.1137/17M1122815
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We consider discrete-time infinite horizon deterministic optimal control problems with nonnegative cost per stage, and a destination that is cost free and absorbing. The classical linear-quadratic regulator problem is a special case. Our assumptions are very general, and allow the possibility that the optimal policy may not be stabilizing the system, e.g., may not reach the destination either asymptotically or in a finite number of steps. We introduce a new unifying notion of stable feedback policy, based on perturbation of the cost per stage, which in addition to implying convergence of the generated states to the destination, quantifies the speed of convergence. We consider the properties of two distinct cost functions: J*, the overall optimal, and (J) over cap, the restricted optimal over just the stable policies. Different classes of stable policies (with different speeds of convergence) may yield different values of (J) over cap. We show that for any class of stable policies, (J) over cap is a solution of Bellman's equation, and we characterize the smallest and the largest solutions: they are J*, and J(+), the restricted optimal cost function over the class of (finitely) terminating policies. We also characterize the regions of convergence of various modified versions of value and policy iteration algorithms, as substitutes for the standard algorithms, which may not work in general.
引用
收藏
页码:231 / 252
页数:22
相关论文
共 24 条
[1]  
Anderson B. D., 2007, OPTIMAL CONTROL LINE
[2]  
[Anonymous], ARXIV160801393
[3]  
[Anonymous], 2012, APPROXIMATE DYNAMIC
[4]  
[Anonymous], 1995, Algebraic Riccati Equations
[5]  
[Anonymous], ABSTRACT DYNAMIC PRO
[6]  
[Anonymous], LIDS2909 MIT
[7]  
[Anonymous], 1996, Neuro-dynamic programming
[8]  
[Anonymous], 2013, Optimal adaptive control and differential games by reinforcement learning principles
[9]  
[Anonymous], IEEE T SYST MAN CY B
[10]  
[Anonymous], NAVAL RES L IN PRESS