Reinforcement learning for robust adaptive control of partially unknown nonlinear systems subject to unmatched uncertainties

被引：27

作者：

Yang, Xiong ^{[1
,2
]}

He, Haibo ^{[2
]}

Wei, Qinglai ^{[3
]}

Luo, Biao ^{[3
]}

机构：

[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China

[2] Univ Rhode Isl, Dept Elect Comp & Biomed Engn, Kingston, RI 02881 USA

[3] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China

来源：

INFORMATION SCIENCES | 2018年 / 463卷

基金：

美国国家科学基金会; 中国国家自然科学基金;

关键词：

Adaptive dynamic programming; Neural networks; Optimal control; Reinforcement learning; Robust control; Unmatched uncertainty; FAULT-TOLERANT CONTROL; LAPLACIAN FRAMEWORK; TRACKING CONTROL; APPROXIMATION; STABILIZATION; DESIGN;

D O I：

10.1016/j.ins.2018.06.022

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper proposes a novel robust adaptive control strategy for partially unknown continuous-time nonlinear systems subject to unmatched uncertainties. Initially, the robust nonlinear control problem is converted into a nonlinear optimal control problem by constructing an appropriate value function for the auxiliary system. After that, within the framework of reinforcement learning, an identifier-critic architecture is developed. The presented architecture uses two neural networks: the identifier neural network (INN) which aims at estimating the unknown internal dynamics and the critic neural network (CNN) which tends to derive the approximate solution of the Hamilton-jacobi-Bellman equation arising in the obtained optimal control problem. The INN is updated by using both the back-propagation algorithm and the e-modification technique. Meanwhile, the CNN is updated via the modified gradient descent method, which uses historical and current state data simultaneously. Based on the classic Lyapunov technique, all the signals in the closed-loop auxiliary system are proved to be uniformly ultimately bounded. Moreover, the original system is kept asymptotically stable under the obtained approximate optimal control. Finally, two illustrative examples, including the F-16 aircraft plant, are provided to demonstrate the effectiveness of the developed method. (C) 2018 Elsevier Inc. All rights reserved.

引用

页码：307 / 322

页数：16

共 49 条

[1] Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach [J].

Abu-Khalaf, M ;

Lewis, FL .

AUTOMATICA, 2005, 41 (05) :779-791

[2]

[Anonymous], 2019, IEEE T SYST MAN CY-S, DOI DOI 10.1109/TSMC.2017.2774602

[3]

[Anonymous], 1999, Neural network control of robot manipulators and nonlinear systems

[4]

[Anonymous], 2010, ADAPTIVE REPRESENTAT

[5]

[Anonymous], 2012, ROBUST ADAPTIVE CONT

[6]

[Anonymous], 2013, Optimal adaptive control and differential games by reinforcement learning principles

[7]

Bernhard P., 1995, H-optimal control and related minimax design problems, V2nd

[8] Neural-approximation-based robust adaptive control of flexible air-breathing hypersonic vehicles with parametric uncertainties and control input constraints [J].

Bu, Xiangwei ;

Wu, Xiaoyan ;

Wei, Daozhi ;

Huang, Jiaqi .

INFORMATION SCIENCES, 2016, 346 :29-43

[9]

Chowdhary G, 2011, P AMER CONTR CONF, P3547

[10] Online Solution of Two-Player Zero-Sum Games for Continuous-Time Nonlinear Systems With Completely Unknown Dynamics [J].

Fu, Yue ;

Chai, Tianyou .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2016, 27 (12) :2577-2587

← 1 2 3 4 5 →