Adaptive dynamic programming

被引:515
作者
Murray, JJ [1 ]
Cox, CJ
Lendaris, GG
Saeks, R
机构
[1] SUNY Stony Brook, Dept Elect Engn, Stony Brook, NY 11790 USA
[2] Accurate Automat Corp, Chattanooga, TN 37421 USA
来源
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS | 2002年 / 32卷 / 02期
基金
美国国家航空航天局; 美国国家科学基金会;
关键词
adaptive control; adaptive critic; dynamic programming; nonlinear control; optimal control;
D O I
10.1109/TSMCC.2002.801727
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Unlike the many soft computing applications where it suffices to achieve a "good approximation most of the time," a control system must be stable all of the time. As such, if one desires to learn a control law in real-time, a fusion of soft computing techniques to learn the appropriate control law with hard computing techniques to maintain the stability constraint and guarantee convergence is required. The objective of the present paper is to describe an adaptive dynamic programming algorithm (ADPA) which fuses soft computing techniques to learn the optimal cost (or return) functional for a stabilizable nonlinear system with unknown dynamics and hard computing techniques to verify the stability and convergence of the algorithm. Specifically, the algorithm is initialized with a (stabilizing) cost functional and the system is run with the corresponding control law (defined by the Hamilton-Jacobi-Bellman equation), with the resultant state trajectories used to update the cost functional in a soft computing mode. Hard computing techniques are then used to show that this process is globally convergent with stepwise stability to the optimal cost functional/control law pair for an (unknown) input affine system with an input quadratic performance measure (modulo the appropriate technical conditions). Three specific implementations of the ADPA are developed for 1) the linear case, 2) for the nonlinear case using a locally quadratic approximation to the cost functional, and 3) the nonlinear case using a radial basis function approximation of the cost functional; illustrated by applications to flight control.
引用
收藏
页码:140 / 153
页数:14
相关论文
共 33 条
[1]  
Barnett S, 1971, MATRICES CONTROL THE
[2]   NEURONLIKE ADAPTIVE ELEMENTS THAT CAN SOLVE DIFFICULT LEARNING CONTROL-PROBLEMS [J].
BARTO, AG ;
SUTTON, RS ;
ANDERSON, CW .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1983, 13 (05) :834-846
[3]  
Bellman R., 1957, DYNAMIC PROGRAMMING
[4]  
Bertsekas D. P., 1987, DYNAMIC PROGRAMMING
[5]  
Cox C, 2001, P AMER CONTR CONF, P2913, DOI 10.1109/ACC.2001.946345
[6]  
Cox C, 1999, INT J ROBUST NONLIN, V9, P1071, DOI 10.1002/(SICI)1099-1239(19991215)9:14<1071::AID-RNC453>3.0.CO
[7]  
2-W
[8]  
Cox C, 1998, IEEE SYS MAN CYBERN, P1652, DOI 10.1109/ICSMC.1998.728126
[9]  
COX C, 1992, P 1992 IEEE INT C SY, P712
[10]   ASYMPTOTIC ESTIMATES FOR SOLUTIONS OF LINEAR-SYSTEMS OF ORDINARY DIFFERENTIAL EQUATIONS HAVING MULTIPLE CHARACTERISTIC ROOTS [J].
DEVINATZ, A ;
KAPLAN, JL .
INDIANA UNIVERSITY MATHEMATICS JOURNAL, 1972, 22 (04) :355-&