Online optimal and adaptive integral tracking control for varying discrete-time systems using reinforcement learning

被引:8
作者
Sanusi, Ibrahim [1 ]
Mills, Andrew [1 ]
Dodd, Tony [1 ]
Konstantopoulos, George [1 ]
机构
[1] Univ Sheffield, Dept Automat Control & Syst Engn, Sheffield, S Yorkshire, England
关键词
adaptive control; adaptive dynamic programming; optimal tracking control; Q-function approximation; reinforcement learning; NONLINEAR-SYSTEMS; CONTROL SCHEME; DESIGN;
D O I
10.1002/acs.3115
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Conventional closed-form solution to the optimal control problem using optimal control theory is only available under the assumption that there are known system dynamics/models described as differential equations. Without such models, reinforcement learning (RL) as a candidate technique has been successfully applied to iteratively solve the optimal control problem for unknown or varying systems. For the optimal tracking control problem, existing RL techniques in the literature assume either the use of a predetermined feedforward input for the tracking control, restrictive assumptions on the reference model dynamics, or discounted tracking costs. Furthermore, by using discounted tracking costs, zero steady-state error cannot be guaranteed by the existing RL methods. This article therefore presents an optimal online RL tracking control framework for discrete-time (DT) systems, which does not impose any restrictive assumptions of the existing methods and equally guarantees zero steady-state tracking error. This is achieved by augmenting the original system dynamics with the integral of the error between the reference inputs and the tracked outputs for use in the online RL framework. It is further shown that the resulting value function for the DT linear quadratic tracker using the augmented formulation with integral control is also quadratic. This enables the development of Bellman equations, which use only the system measurements to solve the corresponding DT algebraic Riccati equation and obtain the optimal tracking control inputs online. Two RL strategies are thereafter proposed based on both the value function approximation and the Q-learning along with bounds on excitation for the convergence of the parameter estimates. Simulation case studies show the effectiveness of the proposed approach.
引用
收藏
页码:971 / 991
页数:21
相关论文
共 47 条
[1]  
ADETOLA VA, 2008, THESIS
[2]  
Astrom K. J., 1995, ADAPTIVE CONTROL
[3]  
Bertsekas Dimitri, 1995, Dynamic programming and optimal control, V2
[4]  
Bertsekas DP, 1995, PROCEEDINGS OF THE 34TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-4, P560, DOI 10.1109/CDC.1995.478953
[5]  
Busoniu L, 2010, AUTOM CONTROL ENG SE, P1, DOI 10.1201/9781439821091-f
[6]   Autonomous Driver Based on an Intelligent System of Decision-Making [J].
Czubenko, Michal ;
Kowalczuk, Zdzislaw ;
Ordys, Andrew .
COGNITIVE COMPUTATION, 2015, 7 (05) :569-581
[7]   Online Optimal Control of Affine Nonlinear Discrete-Time Systems With Unknown Internal Dynamics by Using Time-Based Policy Update [J].
Dierks, Travis ;
Jagannathan, Sarangapani .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2012, 23 (07) :1118-1129
[8]   Optimal Tracking Control of Affine Nonlinear Discrete-time Systems with Unknown Internal Dynamics [J].
Dierks, Travis ;
Jagannathan, S. .
PROCEEDINGS OF THE 48TH IEEE CONFERENCE ON DECISION AND CONTROL, 2009 HELD JOINTLY WITH THE 2009 28TH CHINESE CONTROL CONFERENCE (CDC/CCC 2009), 2009, :6750-6755
[9]  
Ferrari S, 2002, P AMER CONTR CONF, V1-6, P2665, DOI 10.1109/ACC.2002.1025189
[10]   Aircraft Turbine Engine Control Research at NASA Glenn Research Center [J].
Garg, Sanjay .
JOURNAL OF AEROSPACE ENGINEERING, 2013, 26 (02) :422-438