H∞ control with constrained input for completely unknown nonlinear systems using data-driven reinforcement learning method

被引:41
作者
Jiang, He [1 ]
Zhang, Huaguang [1 ]
Luo, Yanhong [1 ]
Cui, Xiaohong [1 ]
机构
[1] Northeastern Univ, Coll Informat Sci & Engn, Box 134, Shenyang 110819, Peoples R China
基金
中国国家自然科学基金;
关键词
Reinforcement learning; Adaptive dynamic programming; Data-driven; Neural networks; OPTIMAL TRACKING CONTROL; DYNAMIC-PROGRAMMING ALGORITHM; DIFFERENTIAL GRAPHICAL GAMES; POLICY UPDATE ALGORITHM; ZERO-SUM GAME; FEEDBACK-CONTROL; CONTROL DESIGN; TIME-SYSTEMS; ITERATION; SYNCHRONIZATION;
D O I
10.1016/j.neucom.2016.11.041
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper investigates the H-infinity control problem for nonlinear systems with completely unknown dynamics and constrained control input by utilizing a novel data-driven reinforcement learning method. It is known that nonlinear H-infinity control problem relies on the solution of Hamilton-Jacobi-Isaacs (HJI) equation, which is essentially a nonlinear partial differential equation and generally impossible to be solved analytically. In order to overcome this difficulty, firstly, we propose a model-based simultaneoui policy update algorithm to learn the solution of HJI equation iteratively and provide its convergence proof. Then, based on this model-based method, we develop a data-driven model-free algorithm, which only requires the real system sampling data generated by arbitrary different control inputs and external disturbances instead of accurate system models, and prove that these two algorithms are equivalent. To implement this model-free algorithm, three neural networks (NNs) are employed to approximate the iterative performance index function, control policy and disturbance policy, respectively, and the least-square approach is used to minimize the NN approximation residual errors. Finally, the proposed scheme is tested on the rotational/translational actuator nonlinear system.
引用
收藏
页码:226 / 234
页数:9
相关论文
共 52 条
[41]   A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm [J].
Zhang, Huaguang ;
Wei, Qinglai ;
Luo, Yanhong .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (04) :937-942
[42]   Distributed Cooperative Optimal Control for Multiagent Systems on Directed Graphs: An Inverse Optimal Approach [J].
Zhang, Huaguang ;
Feng, Tao ;
Yang, Guang-Hong ;
Liang, Hongjing .
IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (07) :1315-1326
[43]   Leader-Based Optimal Coordination Control for the Consensus Problem of Multiagent Differential Games via Fuzzy Adaptive Dynamic Programming [J].
Zhang, Huaguang ;
Zhang, Jilie ;
Yang, Guang-Hong ;
Luo, Yanhong .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2015, 23 (01) :152-163
[44]   Near-Optimal Control for Nonzero-Sum Differential Games of Continuous-Time Nonlinear Systems Using Single-Network ADP [J].
Zhang, Huaguang ;
Cui, Lili ;
Luo, Yanhong .
IEEE TRANSACTIONS ON CYBERNETICS, 2013, 43 (01) :206-216
[45]   Optimal Tracking Control for a Class of Nonlinear Discrete-Time Systems with Time Delays Based on Heuristic Dynamic Programming [J].
Zhang, Huaguang ;
Song, Ruizhuo ;
Wei, Qinglai ;
Zhang, Tieyan .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2011, 22 (12) :1851-1862
[46]   Data-Driven Robust Approximate Optimal Tracking Control for Unknown General Nonlinear Systems Using Adaptive Dynamic Programming Method [J].
Zhang, Huaguang ;
Cui, Lili ;
Zhang, Xin ;
Luo, Yanhong .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2011, 22 (12) :2226-2236
[47]   An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games [J].
Zhang, Huaguang ;
Wei, Qinglai ;
Liu, Derong .
AUTOMATICA, 2011, 47 (01) :207-214
[48]   Neural-Network-Based Near-Optimal Control for a Class of Discrete-Time Affine Nonlinear Systems With Control Constraints [J].
Zhang, Huaguang ;
Luo, Yanhong ;
Liu, Derong .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2009, 20 (09) :1490-1503
[49]   Experience Replay for Optimal Control of Nonzero-Sum Game Systems With Unknown Dynamics [J].
Zhao, Dongbin ;
Zhang, Qichao ;
Wang, Ding ;
Zhu, Yuanheng .
IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (03) :854-865
[50]   MEC-A Near-Optimal Online Reinforcement Learning Algorithm for Continuous Deterministic Systems [J].
Zhao, Dongbin ;
Zhu, Yuanheng .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2015, 26 (02) :346-356