Neural-network-based online optimal control for uncertain non-linear continuous-time systems with control constraints

被引:120
作者
Yang, Xiong [1 ]
Liu, Derong [1 ]
Huang, Yuzhu [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
adaptive control; approximation theory; closed loop systems; continuous time systems; Lyapunov methods; neurocontrollers; nonlinear control systems; optimal control; robust control; uncertain systems; neural network-based online adaptive optimal control; uncertain nonlinear continuous-time systems; control constraints; infinite-horizon optimal control problem; control policy; saturation constraints; identifier-critic architecture; Hamilton-Jacobi-Bellman equation approximation; uncertain system dynamics; critic NN; action-critic dual networks; reinforcement learning; identifier NN; policy iteration; LyapunovaEuros direct method; closed loop system stability; SATURATING ACTUATORS; STABILIZATION; STABILITY;
D O I
10.1049/iet-cta.2013.0472
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this study, an online adaptive optimal control scheme is developed for solving the infinite-horizon optimal control problem of uncertain non-linear continuous-time systems with the control policy having saturation constraints. A novel identifier-critic architecture is presented to approximate the Hamilton-Jacobi-Bellman equation using two neural networks (NNs): an identifier NN is used to estimate the uncertain system dynamics and a critic NN is utilised to derive the optimal control instead of typical action-critic dual networks employed in reinforcement learning. Based on the developed architecture, the identifier NN and the critic NN are tuned simultaneously. Meanwhile, unlike initial stabilising control indispensable in policy iteration, there is no special requirement imposed on the initial control. Moreover, by using Lyapunov's direct method, the weights of the identifier NN and the critic NN are guaranteed to be uniformly ultimately bounded, while keeping the closed-loop system stable. Finally, an example is provided to demonstrate the effectiveness of the present approach.
引用
收藏
页码:2037 / 2047
页数:11
相关论文
共 31 条
[1]   A stable neural network-based observer with application to flexible-joint manipulators [J].
Abdollahi, F ;
Talebi, HA ;
Patel, RV .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2006, 17 (01) :118-129
[2]   Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach [J].
Abu-Khalaf, M ;
Lewis, FL .
AUTOMATICA, 2005, 41 (05) :779-791
[3]  
[Anonymous], 1999, Neural network control of robot manipulators and nonlinear systems
[4]  
[Anonymous], 1996, Neuro-dynamic programming
[5]  
[Anonymous], 2010, Neural Networks and Learning Machines
[6]   Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation [J].
Beard, RW ;
Saridis, GN ;
Wen, JT .
AUTOMATICA, 1997, 33 (12) :2159-2177
[7]  
Bellman R. E., 1957, Dynamic programming. Princeton landmarks in mathematics
[8]   A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems [J].
Bhasin, S. ;
Kamalapurkar, R. ;
Johnson, M. ;
Vamvoudakis, K. G. ;
Lewis, F. L. ;
Dixon, W. E. .
AUTOMATICA, 2013, 49 (01) :82-92
[9]   Robust fault-tolerant control against time-varying actuator faults and saturation [J].
Fan, J. H. ;
Zhang, Y. M. ;
Zheng, Z. Q. .
IET CONTROL THEORY AND APPLICATIONS, 2012, 6 (14) :2198-2208
[10]   Exponential stability and static output feedback stabilisation of singular time-delay systems with saturating actuators [J].
Haidar, A. ;
Boukas, E. K. ;
Xu, S. ;
Lam, J. .
IET CONTROL THEORY AND APPLICATIONS, 2009, 3 (09) :1293-1305