Near optimal output feedback control of nonlinear discrete-time systems based on reinforcement neural network learning

被引:38
作者
Zhao, Qiming [1 ]
Xu, Hao [2 ]
Jagannathan, Sarangapani [3 ]
机构
[1] DENSO International America Inc., 48033, MI
[2] College of Science and Engineering, Texas A and M University, 78414, TX
[3] Department of Electrical and Computer Engineering, Missouri University of Science and Technology, 65401, MO
来源
Zhao, Qiming (qzfyc@mst.edu) | 1600年 / Institute of Electrical and Electronics Engineers Inc.卷 / 01期
关键词
approximate dynamic programming; Finite-horizon; Hamilton-Jacobi-Bellman equation; neural network; optimal regulation;
D O I
10.1109/JAS.2014.7004665
中图分类号
学科分类号
摘要
In this paper, the output feedback based finite-horizon near optimal regulation of nonlinear affine discrete-time systems with unknown system dynamics is considered by using neural networks (NNs) to approximate Hamilton-Jacobi-Bellman (HJB) equation solution. First, a NN-based Luenberger observer is proposed to reconstruct both the system states and the control coefficient matrix. Next, reinforcement learning methodology with actor-critic structure is utilized to approximate the time-varying solution, referred to as the value function, of the HJB equation by using a NN. To properly satisfy the terminal constraint, a new error term is defined and incorporated in the NN update law so that the terminal constraint error is also minimized over time. The NN with constant weights and time-dependent activation function is employed to approximate the time-varying value function which is subsequently utilized to generate the finite-horizon near optimal control policy due to NN reconstruction errors. The proposed scheme functions in a forward-in-time manner without offline training phase. Lyapunov analysis is used to investigate the stability of the overall closed-loop system. Simulation results are given to show the effectiveness and feasibility of the proposed method. © 2014 IEEE.
引用
收藏
页码:372 / 384
页数:12
相关论文
共 21 条
  • [1] Kirk D., Optimal Control Theory An Introduction, (1970)
  • [2] Lewis F.L., Syrmos V.L., Optimal Control, (1995)
  • [3] Bradtke S.J., Ydstie B.E., Barto A.G., Adaptive linear quadratic control using policy iteration, Proceedings of the 1994 American Control Conference, 1994, pp. 3475-3479
  • [4] Abu-Khalaf M., Lewis F.L., Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network hjb approach, Automatica, 41, 5, pp. 77-79, (2005)
  • [5] Xu H., Jagannathan S., Lewis F.L., Stochastic optimal control of unknown networked control systems in the presence of random delays and packet losses, Automatica, 48, 6, pp. 1017-1030, (2012)
  • [6] Xu H., Jagannathan S., Stochastic optimal controller design for uncertain nonlinear networked control system via neuro dynamic programming, IEEE Transactions on Neural Networks and Learning Systems, 24, 3, pp. 471-484, (2013)
  • [7] Dierks T., Jagannathan S., Online optimal control of affine nonlinear discrete-Time systems with unknown internal dynamics by using timebased policy update, IEEE Transactions on Neural Networks and Learning Systems, 23, 7, pp. 1118-1129, (2012)
  • [8] Chen Z., Jagannathan S., Generalized hamilton-jacobi-bellman formulation based neural network control of affine nonlinear discrete-Time systems, IEEE Transactions on Neural Networks, 19, 1, pp. 90-106, (2008)
  • [9] Slotine J.E., Li W., Applied Nonlinear Control, (1991)
  • [10] Khalil H.K., Laurent P., High-gain observers in nonlinear feedback control, International Journal of Robust and Nonlinear Control, 24, 6, pp. 993-1015, (2014)