Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning

被引：546

作者：

Modares, Hamidreza ^{[1
]}

Lewis, Frank L. ^{[1
]}

机构：

[1] Univ Texas Arlington, Res Inst, Ft Worth, TX 76118 USA

来源：

AUTOMATICA | 2014年 / 50卷 / 07期

关键词：

Optimal tracking control; Integral reinforcement learning; Input constrainers; Neural networks; ADAPTIVE OPTIMAL-CONTROL; POLICY ITERATION; TIME-SYSTEMS; APPROXIMATION;

D O I：

10.1016/j.automatica.2014.05.011

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, a new formulation for the optimal tracking control problem (OTCP) of continuous-time nonlinear systems is presented. This formulation extends the integral reinforcement learning (IRL) technique, a method for solving optimal regulation problems, to learn the solution to the OTCP. Unlike existing solutions to the OTCP, the proposed method does not need to have or to identify knowledge of the system drift dynamics, and it also takes into account the input constraints a priori. An augmented system composed of the error system dynamics and the command generator dynamics is used to introduce a new nonquadratic discounted performance function for the OTCP. This encodes the input constrains into the optimization problem. A tracking Hamilton-Jacobi-Bellman (HJB) equation associated with this nonquadratic performance function is derived which gives the optimal control solution. An online IRL algorithm is presented to learn the solution to the tracking HJB equation without knowing the system drift dynamics. Convergence to a near-optimal control solution and stability of the whole system are shown under a persistence of excitation condition. Simulation examples are provided to show the effectiveness of the proposed method. (C) 2014 Elsevier Ltd. All rights reserved.

引用

页码：1780 / 1792

页数：13

共 33 条

[1] Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach [J].

Abu-Khalaf, M ;

Lewis, FL .

AUTOMATICA, 2005, 41 (05) :779-791

[2]

[Anonymous], 2012, REINFORCEMENT LEARNI

[3]

[Anonymous], 1996, Neuro-dynamic programming

[4]

[Anonymous], 2013, Optimal adaptive control and differential games by reinforcement learning principles

[5] A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems [J].

Bhasin, S. ;

Kamalapurkar, R. ;

Johnson, M. ;

Vamvoudakis, K. G. ;

Lewis, F. L. ;

Dixon, W. E. .

AUTOMATICA, 2013, 49 (01) :82-92

[6]

Dierks T, 2010, P AMER CONTR CONF, P1568

[7] Optimal Tracking Control of Affine Nonlinear Discrete-time Systems with Unknown Internal Dynamics [J].

Dierks, Travis ;

Jagannathan, S. .

PROCEEDINGS OF THE 48TH IEEE CONFERENCE ON DECISION AND CONTROL, 2009 HELD JOINTLY WITH THE 2009 28TH CHINESE CONTROL CONFERENCE (CDC/CCC 2009), 2009, :6750-6755

[8]

Finlayson B.A., 1990, The method of weighted residuals and variational principles

[9] UNIVERSAL APPROXIMATION OF AN UNKNOWN MAPPING AND ITS DERIVATIVES USING MULTILAYER FEEDFORWARD NETWORKS [J].

HORNIK, K ;

STINCHCOMBE, M ;

WHITE, H .

NEURAL NETWORKS, 1990, 3 (05) :551-560

[10]

Howard R. A., 1960, Dynamic programming and Markov processes

← 1 2 3 4 →