Off-Policy Actor-Critic Structure for Optimal Control of Unknown Systems With Disturbances

被引:198
作者
Song, Ruizhuo [1 ]
Lewis, Frank L. [2 ,3 ]
Wei, Qinglai [4 ]
Zhang, Huaguang [5 ]
机构
[1] Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Beijing 100083, Peoples R China
[2] Univ Texas Arlington, UTA Res Inst, Ft Worth, TX 76118 USA
[3] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang 110004, Peoples R China
[4] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China
[5] Northeastern Univ, Sch Informat Sci & Engn, Shenyang 110004, Peoples R China
基金
美国国家科学基金会; 北京市自然科学基金; 中国国家自然科学基金;
关键词
Adaptive critic designs; adaptive/approximate dynamic programming (ADP); dynamic programming; off-policy; optimal control; unknown system; OPTIMAL TRACKING CONTROL; ADAPTIVE OPTIMAL-CONTROL; TIME NONLINEAR-SYSTEMS; OPTIMAL-CONTROL SCHEME; FEEDBACK-CONTROL; ALGORITHM; ITERATION; DESIGN;
D O I
10.1109/TCYB.2015.2421338
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
An optimal control method is developed for unknown continuous-time systems with unknown disturbances in this paper. The integral reinforcement learning (IRL) algorithm is presented to obtain the iterative control. Off-policy learning is used to allow the dynamics to be completely unknown. Neural networks are used to construct critic and action networks. It is shown that if there are unknown disturbances, off-policy IRL may not converge or may be biased. For reducing the influence of unknown disturbances, a disturbances compensation controller is added. It is proven that the weight errors are uniformly ultimately bounded based on Lyapunov techniques. Convergence of the Hamiltonian function is also proven. The simulation study demonstrates the effectiveness of the proposed optimal control method for unknown systems with disturbances.
引用
收藏
页码:1041 / 1050
页数:10
相关论文
共 44 条
  • [1] Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach
    Abu-Khalaf, M
    Lewis, FL
    [J]. AUTOMATICA, 2005, 41 (05) : 779 - 791
  • [2] Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof
    Al-Tamimi, Asma
    Lewis, Frank L.
    Abu-Khalaf, Murad
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (04): : 943 - 949
  • [3] [Anonymous], 1999, Neural network control of robot manipulators and nonlinear systems
  • [4] Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation
    Beard, RW
    Saridis, GN
    Wen, JT
    [J]. AUTOMATICA, 1997, 33 (12) : 2159 - 2177
  • [5] A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems
    Bhasin, S.
    Kamalapurkar, R.
    Johnson, M.
    Vamvoudakis, K. G.
    Lewis, F. L.
    Dixon, W. E.
    [J]. AUTOMATICA, 2013, 49 (01) : 82 - 92
  • [6] Approximation-Based Adaptive Neural Control Design for a Class of Nonlinear Systems
    Chen, Bing
    Liu, Kefu
    Liu, Xiaoping
    Shi, Peng
    Lin, Chong
    Zhang, Huaguang
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (05) : 610 - 619
  • [7] NONLINEAR CONTROL VIA APPROXIMATE INPUT OUTPUT LINEARIZATION - THE BALL AND BEAM EXAMPLE
    HAUSER, J
    SASTRY, S
    KOKOTOVIC, P
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1992, 37 (03) : 392 - 398
  • [8] Finite-Horizon Control-Constrained Nonlinear Optimal Control Using Single Network Adaptive Critics
    Heydari, Ali
    Balakrishnan, Sivasubramanya N.
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2013, 24 (01) : 145 - 157
  • [9] DEGREE OF APPROXIMATION RESULTS FOR FEEDFORWARD NETWORKS APPROXIMATING UNKNOWN MAPPINGS AND THEIR DERIVATIVES
    HORNIK, K
    STINCHCOMBE, M
    WHITE, H
    AUER, P
    [J]. NEURAL COMPUTATION, 1994, 6 (06) : 1262 - 1275
  • [10] Robust Adaptive Dynamic Programming and Feedback Stabilization of Nonlinear Systems
    Jiang, Yu
    Jiang, Zhong-Ping
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2014, 25 (05) : 882 - 893