A Deep Deterministic Policy Gradient Learning Approach to Missile Autopilot Design

被引:13
作者
Candeli, Angelo [1 ]
de Tommasi, Gianmaria [2 ]
Lui, Dario Giuseppe [2 ]
Mele, Adriano [3 ]
Santini, Stefania [2 ]
Tartaglione, Gaetano [4 ]
机构
[1] MBDA Italia SpA, I-00131 Rome, Italy
[2] Univ Naples Federico II, Dept Elect Engn & Informat Technol, I-80125 Naples, Italy
[3] Univ Tuscia, Dipartimento Econ Ingn Soc & Impresa DEIM, I-00110 Viterbo, Italy
[4] Univ Naples Parthenope, Dept Engn, I-80143 Naples, Italy
关键词
Missiles; Autopilot; Aerodynamics; Training; Neural networks; Stability analysis; Optimization; autopilot; reinforcement learning; REINFORCEMENT; MODEL; AVOIDANCE;
D O I
10.1109/ACCESS.2022.3150926
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper a Deep Reinforcement Learning algorithm, known as Deep Deterministic Policy Gradient (DDPG), is applied to the problem of designing a missile lateral acceleration control system. To this aim, the autopilot control problem is recast in the Reinforcement Learning framework, where the environment consists of a 2-Degrees-of-Freedom nonlinear model of the missile's longitudinal dynamics, while the agent training procedure is carried out on a linearized version of the model. In particular, we show how to account not only for the stabilization of the longitudinal dynamic, but also for the main performance indexes (settling-time, undershoot, steady-state error, etc.) in the DDPG reward function. The effectiveness of the proposed DDPG-based missile autopilot is assessed through extensive numerical simulations, carried out on both the linearized and the fully nonlinear dynamics by considering different flight conditions and uncertainty in the aerodynamic coefficients, and its performance is compared against two model-based control strategies in order to check the capability of the proposed data-driven approach to achieve prescribed closed-loop response in a completely model-free fashion.
引用
收藏
页码:19685 / 19696
页数:12
相关论文
共 42 条
[1]   Reinforcement Learning-Based Distributed BESS Management for Mitigating Overvoltage Issues in Systems With High PV Penetration [J].
Al-Saffar, Mohammed ;
Musilek, Petr .
IEEE TRANSACTIONS ON SMART GRID, 2020, 11 (04) :2980-2994
[2]   Reinforcement Learning Interpretation Methods: A Survey [J].
Alharin, Alnour ;
Doan, Thanh-Nam ;
Sartipi, Mina .
IEEE ACCESS, 2020, 8 :171058-171077
[3]   SELF-SCHEDULED H-INFINITY CONTROL OF LINEAR PARAMETER-VARYING SYSTEMS - A DESIGN EXAMPLE [J].
APKARIAN, P ;
GAHINET, P ;
BECKER, G .
AUTOMATICA, 1995, 31 (09) :1251-1261
[4]   Nonlinear Model-Predictive Integrated Missile Control and Its Multiobjective Tuning [J].
Bachtiar, Vincent ;
Manzie, Chris ;
Kerrigan, Eric C. .
JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2017, 40 (11) :2961-2970
[5]   Application of Deep Reinforcement Learning to Predict Shaft Deformation Considering Hull Deformation of Medium-Sized Oil/Chemical Tanker [J].
Choi, Shin-Pyo ;
Lee, Jae-Ung ;
Park, Jun-Bum .
JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2021, 9 (07)
[6]  
Cortez B., ARXIV201112956
[7]  
Fujimoto S, 2018, PR MACH LEARN RES, V80
[8]  
Haarnoja T, 2018, PR MACH LEARN RES, V80
[9]   Integral Reinforcement Learning-Based Multi-Robot Minimum Time-Energy Path Planning Subject to Collision Avoidance and Unknown Environmental Disturbances [J].
He, Chenyuan ;
Wan, Yan ;
Gu, Yixin ;
Lewis, Frank L. .
IEEE CONTROL SYSTEMS LETTERS, 2021, 5 (03) :983-988
[10]   Robustness Improvement for a Three-Loop Missile Autopilot Using Discontinuous State Feedback [J].
Hwang, Donghyeok ;
Tahk, Min-Jea .
INTERNATIONAL JOURNAL OF AERONAUTICAL AND SPACE SCIENCES, 2018, 19 (03) :661-674