Error Dynamics Based Dual Heuristic Dynamic Programming for Self-Learning Flight Control

被引:4
作者
Huang, Xu [1 ,2 ]
Zhang, Yuan [1 ,2 ]
Liu, Jiarun [1 ,2 ]
Zhong, Honghao [1 ]
Wang, Zhaolei [1 ,2 ]
Peng, Yue [1 ]
机构
[1] Beijing Aerosp Automat Control Inst, Beijing 100854, Peoples R China
[2] Natl Key Lab Sci & Technol Aerosp Intelligence Con, Beijing 100854, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 01期
基金
中国国家自然科学基金;
关键词
error dynamics; dual heuristic dynamic programming; recursive least square method; online self-learning; attitude control; OPTIMAL TRACKING;
D O I
10.3390/app13010586
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
A data-driven nonlinear control approach, called error dynamics-based dual heuristic dynamic programming (ED-DHP), is proposed for air vehicle attitude control. To solve the optimal tracking control problem, the augmented system is defined by the derived error dynamics and reference trajectory so that the actor neural network can learn the feedforward and feedback control terms at the same time. During the online self-learning process, the actor neural network learns the control policy by minimizing the augmented system's value function. The input dynamics identified by the recursive least square (RLS) and output of the critic neural network are used to update the actor neural network. In addition, the total uncertainty term of the error dynamics is also identified by RLS, which can compensate for the uncertainty caused by inaccurate modeling, parameter perturbation, and so on. The outputs of ED-DHP include the rough trim surface, feedforward and feedback terms from the actor neural network, and the compensation. Based on this control scheme, the complete knowledge of system dynamics and the reference trajectory dynamics are not needed, and offline learning is unnecessary. To verify the self-learning ability of ED-DHP, two numerical experiments are carried out based on the established morphing air vehicle model. One is sinusoidal signal tracking at a fixed operating point, and the other is guidance command tracking with a morphing process at variable operating points. The simulation results demonstrate the good performance of ED-DHP for online self-learning attitude control and validate the robustness of the proposed scheme
引用
收藏
页数:16
相关论文
共 29 条
[1]   Missile defense and interceptor allocation by neuro-dynamic programming [J].
Bertsekas, DP ;
Homer, ML ;
Logan, DA ;
Patek, SD ;
Sandell, NR .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2000, 30 (01) :42-51
[2]   Review of advanced guidance and control algorithms for space/ aerospace vehicles [J].
Chai, Runqi ;
Tsourdos, Antonios ;
Al Savvaris ;
Chai, Senchun ;
Xia, Yuanqing ;
Chen, C. L. Philip .
PROGRESS IN AEROSPACE SCIENCES, 2021, 122
[3]   Magnetic control of tokamak plasmas through deep reinforcement learning [J].
Degrave, Jonas ;
Felici, Federico ;
Buchli, Jonas ;
Neunert, Michael ;
Tracey, Brendan ;
Carpanese, Francesco ;
Ewalds, Timo ;
Hafner, Roland ;
Abdolmaleki, Abbas ;
de las Casas, Diego ;
Donner, Craig ;
Fritz, Leslie ;
Galperti, Cristian ;
Huber, Andrea ;
Keeling, James ;
Tsimpoukelli, Maria ;
Kay, Jackie ;
Merle, Antoine ;
Moret, Jean-Marc ;
Noury, Seb ;
Pesamosca, Federico ;
Pfau, David ;
Sauter, Olivier ;
Sommariva, Cristian ;
Coda, Stefano ;
Duval, Basil ;
Fasoli, Ambrogio ;
Kohli, Pushmeet ;
Kavukcuoglu, Koray ;
Hassabis, Demis ;
Riedmiller, Martin .
NATURE, 2022, 602 (7897) :414-+
[4]   Review of control and guidance technology on hypersonic vehicle [J].
Ding, Yibo ;
Yue, Xiaokui ;
Chen, Guangshan ;
Si, Jiashun .
CHINESE JOURNAL OF AERONAUTICS, 2022, 35 (07) :1-18
[5]   Switched adaptive active disturbance rejection control of variable structure near space vehicles based on adaptive dynamic programming [J].
Dong, Chaoyang ;
Liu, Chen ;
Wang, Qing ;
Gong, Ligang .
CHINESE JOURNAL OF AERONAUTICS, 2019, 32 (07) :1684-1694
[6]   Online adaptive critic flight control [J].
Ferrari, S ;
Stengel, RF .
JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2004, 27 (05) :777-786
[7]  
Han J.Q., 2008, AUTODISTURBANCE REJE, P64
[8]   Online policy iteration ADP-based attitude-tracking control for hypersonic vehicles [J].
Han, Xiao ;
Zheng, Zongzhun ;
Liu, Lei ;
Wang, Bo ;
Cheng, Zhongtao ;
Fan, Huijin ;
Wang, Yongji .
AEROSPACE SCIENCE AND TECHNOLOGY, 2020, 106
[9]   Neural-network-based optimal tracking control scheme for a class of unknown discrete-time nonlinear systems using iterative ADP algorithm [J].
Huang, Yuzhu ;
Liu, Derong .
NEUROCOMPUTING, 2014, 125 :46-56
[10]  
Hui J.P., 2022, ACTA AERONAUT ASTRON, V43, P325960