Near optimal output feedback control of nonlinear discrete-time systems based on reinforcement neural network learning

被引：38

作者：

Zhao, Qiming ^{[1
]}

Xu, Hao ^{[2
]}

Jagannathan, Sarangapani ^{[3
]}

机构：

[1] DENSO International America Inc., 48033, MI

[2] College of Science and Engineering, Texas A and M University, 78414, TX

[3] Department of Electrical and Computer Engineering, Missouri University of Science and Technology, 65401, MO

来源：

Zhao, Qiming (qzfyc@mst.edu) | 1600年 / Institute of Electrical and Electronics Engineers Inc.卷 / 01期

关键词：

approximate dynamic programming; Finite-horizon; Hamilton-Jacobi-Bellman equation; neural network; optimal regulation;

D O I：

10.1109/JAS.2014.7004665

中图分类号：

学科分类号：

摘要：

In this paper, the output feedback based finite-horizon near optimal regulation of nonlinear affine discrete-time systems with unknown system dynamics is considered by using neural networks (NNs) to approximate Hamilton-Jacobi-Bellman (HJB) equation solution. First, a NN-based Luenberger observer is proposed to reconstruct both the system states and the control coefficient matrix. Next, reinforcement learning methodology with actor-critic structure is utilized to approximate the time-varying solution, referred to as the value function, of the HJB equation by using a NN. To properly satisfy the terminal constraint, a new error term is defined and incorporated in the NN update law so that the terminal constraint error is also minimized over time. The NN with constant weights and time-dependent activation function is employed to approximate the time-varying value function which is subsequently utilized to generate the finite-horizon near optimal control policy due to NN reconstruction errors. The proposed scheme functions in a forward-in-time manner without offline training phase. Lyapunov analysis is used to investigate the stability of the overall closed-loop system. Simulation results are given to show the effectiveness and feasibility of the proposed method. © 2014 IEEE.

引用

页码：372 / 384

页数：12

共 21 条

[1] Kirk D., Optimal Control Theory An Introduction, (1970)
[2] Lewis F.L., Syrmos V.L., Optimal Control, (1995)
[3] Bradtke S.J., Ydstie B.E., Barto A.G., Adaptive linear quadratic control using policy iteration, Proceedings of the 1994 American Control Conference, 1994, pp. 3475-3479
[4] Abu-Khalaf M., Lewis F.L., Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network hjb approach, Automatica, 41, 5, pp. 77-79, (2005)
[5] Xu H., Jagannathan S., Lewis F.L., Stochastic optimal control of unknown networked control systems in the presence of random delays and packet losses, Automatica, 48, 6, pp. 1017-1030, (2012)
[6] Xu H., Jagannathan S., Stochastic optimal controller design for uncertain nonlinear networked control system via neuro dynamic programming, IEEE Transactions on Neural Networks and Learning Systems, 24, 3, pp. 471-484, (2013)
[7] Dierks T., Jagannathan S., Online optimal control of affine nonlinear discrete-Time systems with unknown internal dynamics by using timebased policy update, IEEE Transactions on Neural Networks and Learning Systems, 23, 7, pp. 1118-1129, (2012)
[8] Chen Z., Jagannathan S., Generalized hamilton-jacobi-bellman formulation based neural network control of affine nonlinear discrete-Time systems, IEEE Transactions on Neural Networks, 19, 1, pp. 90-106, (2008)
[9] Slotine J.E., Li W., Applied Nonlinear Control, (1991)
[10] Khalil H.K., Laurent P., High-gain observers in nonlinear feedback control, International Journal of Robust and Nonlinear Control, 24, 6, pp. 993-1015, (2014)

← 1 2 3 →