Model-free Q-learning over Finite Horizon for Uncertain Linear Continuous-time Systems

被引:0
作者
Xu, Hao [1 ]
Jagannathan, S. [2 ]
机构
[1] Texas A&M Univ, Coll Sci & Engn, Corpus Christi, TX 78412 USA
[2] Missouri Univ Sci & Technol, Dept Elect & Comp Engn, Rolla, MO USA
来源
2014 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING (ADPRL) | 2014年
关键词
Adaptive Dynamics Programming (ADP); Q-learning; Optimal Control; Riccati Equation; Forward-in-time;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a novel optimal control over finite horizon has been introduced for linear continuous-time systems by using adaptive dynamic programming (ADP). First, a new time-varying Q-function parameterization and its estimator are introduced. Subsequently, Q-function estimator is tuned online by using both Bellman equation in integral form and terminal cost. Eventually, near optimal control gain is obtained by using the Q-function estimator. All the closed-loop signals are shown to be bounded by using Lyapunov stability analysis where bounds are functions of initial conditions and final time while the estimated control signal converges close to the optimal value. The simulation results illustrate the effectiveness of the proposed scheme.
引用
收藏
页码:164 / 169
页数:6
相关论文
共 21 条
[1]   Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control [J].
Al-Tamimi, Asma ;
Lewis, Frank L. ;
Abu-Khalaf, Murad .
AUTOMATICA, 2007, 43 (03) :473-481
[2]  
[Anonymous], 2008, IEEE CONTR SYST MAG
[3]  
[Anonymous], 1989, (Ph.D. thesis
[4]  
[Anonymous], DIFFERENTIAL GEOMETR
[5]  
[Anonymous], ENCY MATH
[6]  
[Anonymous], 2012, Dynamic Programming and Optimal Control
[7]  
[Anonymous], 2007, Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)
[8]   THE THEORY OF DYNAMIC PROGRAMMING [J].
BELLMAN, R .
BULLETIN OF THE AMERICAN MATHEMATICAL SOCIETY, 1954, 60 (06) :503-515
[9]  
Cassels J.W.S., 1957, An Introduction to Diophantine Approximation
[10]   Online Optimal Control of Affine Nonlinear Discrete-Time Systems With Unknown Internal Dynamics by Using Time-Based Policy Update [J].
Dierks, Travis ;
Jagannathan, Sarangapani .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2012, 23 (07) :1118-1129