Synergetic learning for unknown nonlinear H. control using neural networks

被引:0
作者
Zhu, Liao [1 ,2 ]
Guo, Ping [1 ,2 ]
Wei, Qinglai [3 ,4 ,5 ]
机构
[1] Beijing Normal Univ, Int Acad Ctr Complex Syst, Zhuhai 519087, Guangdong, Peoples R China
[2] Beijing Normal Univ, Sch Syst Sci, Beijing 100875, Peoples R China
[3] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China
[4] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
[5] Macau Univ Sci & Technol, Inst Syst Engn, Taipa 999078, Macau, Peoples R China
关键词
H; control; Nonlinear systems; Adaptive dynamic programming; Temporal difference; Neural network; Data-driven; STATE-FEEDBACK CONTROL; ZERO-SUM GAMES; POLICY UPDATE ALGORITHM; SYSTEMS; EQUATION;
D O I
10.1016/j.neunet.2023.09.029
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The well-known H. control design gives robustness to a controller by rejecting perturbations from the external environment, which is difficult to do for completely unknown affine nonlinear systems. Accordingly, the immediate objective of this paper is to develop an on-line real-time synergetic learning algorithm, so that a data-driven H. controller can be received. By converting the H. control problem into a two-player zero-sum game, a model-free Hamilton-Jacobi-Isaacs equation (MF-HJIE) is first derived using off-policy reinforcement learning, followed by a proof of equivalence between the MF-HJIE and the conventional HJIE. Next, by applying the temporal difference to the MF-HJIE, a synergetic evolutionary rule with experience replay is designed to learn the optimal value function, the optimal control, and the worst perturbation, that can be performed on-line and in real-time along the system state trajectory. It is proven that the synergistic learning system constructed by the system plant and the evolutionary rule is uniformly ultimately bounded. Finally, simulation results on an F16 aircraft system and a nonlinear system back up the tractability of the proposed method.
引用
收藏
页码:287 / 299
页数:13
相关论文
共 46 条
[1]   Policy iterations on the Hamilton-Jacobi-Isaacs equation for H∞ state feedback control with input saturation [J].
Abu-Khalaf, Murad ;
Lewis, Frank L. ;
Huang, Jie .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2006, 51 (12) :1989-1995
[2]   Adaptive critic designs for discrete-time zero-sum games with application to H∞ control [J].
Al-Tamimi, Asma ;
Abu-Khalaf, Murad ;
Lewis, Frank L. .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2007, 37 (01) :240-247
[3]  
Basar, 1999, DYNAMIC NONCOOPERATI
[4]  
Basar T., 1995, H OPTIMAL CONTROL RE
[5]   Observer-Based Dynamic Event-Triggered Control for Multiagent Systems With Time-Varying Delay [J].
Cao, Liang ;
Pan, Yingnan ;
Liang, Hongjing ;
Huang, Tingwen .
IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (05) :3376-3387
[6]   Intrinsic Plasticity-Based Neuroadptive Control With Both Weights and Excitability Tuning [J].
Chen, Qing ;
Zhang, Anguo ;
Song, Yongduan .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (07) :3282-3286
[7]   STATE-SPACE SOLUTIONS TO STANDARD H-2 AND H-INFINITY CONTROL-PROBLEMS [J].
DOYLE, JC ;
GLOVER, K ;
KHARGONEKAR, PP ;
FRANCIS, BA .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1989, 34 (08) :831-847
[8]  
Guo P, 2020, Arxiv, DOI arXiv:2006.06367
[9]   UNIVERSAL APPROXIMATION OF AN UNKNOWN MAPPING AND ITS DERIVATIVES USING MULTILAYER FEEDFORWARD NETWORKS [J].
HORNIK, K ;
STINCHCOMBE, M ;
WHITE, H .
NEURAL NETWORKS, 1990, 3 (05) :551-560
[10]  
Jeffreys H., 1999, Cambridge mathematical library, Methods of mathematical physics, V3rd