Online Adaptive Policy Learning Algorithm for H∞ State Feedback Control of Unknown Affine Nonlinear Discrete-Time Systems

被引：199

作者：

Zhang, Huaguang ^{[1
,2
]}

Qin, Chunbin ^{[1
]}

Jiang, Bin ^{[3
]}

Luo, Yanhong ^{[1
]}

机构：

[1] Northeastern Univ, Coll Informat Sci & Engn, Shenyang 110004, Peoples R China

[2] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang 110004, Peoples R China

[3] Nanjing Univ Aeronaut & Astronaut, Coll Automat Engn, Nanjing 210016, Jiangsu, Peoples R China

来源：

IEEE TRANSACTIONS ON CYBERNETICS | 2014年 / 44卷 / 12期

基金：

国家高技术研究发展计划(863计划); 中国国家自然科学基金;

关键词：

Adaptive dynamic programming; H-infinity control; neural networks; nonlinear discrete-time system; zero-sum game; ZERO-SUM GAMES; OPTIMAL TRACKING CONTROL; EQUATION; DESIGN;

D O I：

10.1109/TCYB.2014.2313915

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The problem of H-infinity state feedback control of affine nonlinear discrete-time systems with unknown dynamics is investigated in this paper. An online adaptive policy learning algorithm (APLA) based on adaptive dynamic programming (ADP) is proposed for learning in real-time the solution to the Hamilton-Jacobi-Isaacs (HJI) equation, which appears in the H-infinity control problem. In the proposed algorithm, three neural networks (NNs) are utilized to find suitable approximations of the optimal value function and the saddle point feedback control and disturbance policies. Novel weight updating laws are given to tune the critic, actor, and disturbance NNs simultaneously by using data generated in real-time along the system trajectories. Considering NN approximation errors, we provide the stability analysis of the proposed algorithm with Lyapunov approach. Moreover, the need of the system input dynamics for the proposed algorithm is relaxed by using a NN identification scheme. Finally, simulation examples show the effectiveness of the proposed algorithm.

引用

页码：2706 / 2718

页数：13

共 57 条

[1] Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach [J].

Abu-Khalaf, M ;

Lewis, FL .

AUTOMATICA, 2005, 41 (05) :779-791

[2] Policy iterations on the Hamilton-Jacobi-Isaacs equation for H∞ state feedback control with input saturation [J].

Abu-Khalaf, Murad ;

Lewis, Frank L. ;

Huang, Jie .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2006, 51 (12) :1989-1995

[3] Neurodynamic programming and zero-sum games for constrained control systems [J].

Abu-Khalaf, Murad ;

Lewis, Frank L. ;

Huang, Jie .

IEEE TRANSACTIONS ON NEURAL NETWORKS, 2008, 19 (07) :1243-1252

[4]

Al-Tamimi A., 2010, AUTOMATICA, V43, P682

[5] Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof [J].

Al-Tamimi, Asma ;

Lewis, Frank L. ;

Abu-Khalaf, Murad .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (04) :943-949

[6] Adaptive critic designs for discrete-time zero-sum games with application to H∞ control [J].

Al-Tamimi, Asma ;

Abu-Khalaf, Murad ;

Lewis, Frank L. .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2007, 37 (01) :240-247

[7]

[Anonymous], 1999, Neural network control of robot manipulators and nonlinear systems

[8]

[Anonymous], 2008, IEEE CONTR SYST MAG

[9]

[Anonymous], 1996, Neuro-dynamic programming

[10]

[Anonymous], 2007, Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)

← 1 2 3 4 5 6 →