Memory-extraction-based DRL cooperative guidance against the maneuvering target protected by interceptors

被引：0

作者：

Sun, Hao ^{[1
]}

Yan, Shi ^{[1
]}

Liang, Yan ^{[1
]}

Ma, Chaoxiong ^{[1
]}

Zhang, Tao ^{[2
]}

Pei, Liuyu ^{[1
]}

机构：

[1] Northwestern Polytech Univ, Sch Automat, Xian 710072, Shaanxi, Peoples R China

[2] Air Force Engn Univ, Sch Air Traff Control & Nav, Xian 710051, Shaanxi, Peoples R China

来源：

AEROSPACE SCIENCE AND TECHNOLOGY | 2024年 / 155卷

基金：

中国国家自然科学基金;

关键词：

Missiles; Cooperative guidance; Spatio-temporal memory extraction; Multi-order Markov decision process; Deep reinforcement learning; Maneuvering target;

D O I：

10.1016/j.ast.2024.109575

中图分类号：

V [航空、航天];

学科分类号：

08 ; 0825 ;

摘要：

This paper presents an open and interesting issue for missiles, i.e., achieving collaborative parameters constrained cooperative guidance, despite the interference of pursing interceptors (INTs) and the maneuvering target, by the fact that the target-missile-interceptor (TMI) engagement induces their complex and time-varying relationships. The Memory-Extraction-based Soft-Actor-Critic (ME-SAC) approach is proposed, which enhances the collaborative performance of missiles by implicitly extracting coupling motion characteristics among TMI from historical state, achieving the joint optimization of situation awareness and strategy. Firstly, the cooperative guidance task is formulated as a multi-order Markov decision process (MOMDP) to better represent the dynamic evolution of engagement, and a memory-extraction process is introduced to alleviate the curse of dimensionality. Secondly, a memory-decision-oriented maximum entropy framework combined with memory update modules is designed for enhancing strategy search ability. Then, a domain-knowledge-based pre-training is implemented to improve convergence speed. Finally, in simulation evaluation with various scenarios, the proposed ME-SAC shows more promising than the typical DRL-based and model-based algorithms in task success rate, learning efficiency, and adaptability.

引用

页数：18

共 8 条

[1] Cayci S, 2024, Arxiv, DOI arXiv:2402.12241
[2] Haarnoja T, 2019, Arxiv, DOI [arXiv:1812.05905, DOI 10.48550/ARXIV.1812.05905]
[3] Cerebellar Model Articulation Neural Network-Based Distributed Fault Tolerant Tracking Control With Obstacle Avoidance for Fixed-Wing UAVs
Qian, Moshu
Wu, Zhu
Jiang, Bin
[J]. IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2023, 59 (05) : 6841 - 6852
[4] Memory-Based Deep Reinforcement Learning for Obstacle Avoidance in UAV With Limited Environment Knowledge
Singla, Abhik
Padakandla, Sindhu
Bhatnagar, Shalabh
[J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2021, 22 (01) : 107 - 118
[5] Influence-aware memory architectures for deep reinforcement learning in POMDPs
Suau, Miguel
He, Jinke
Congeduti, Elena
Starre, Rolf A. N.
Czechowski, Aleksander
Oliehoek, Frans A.
[J]. NEURAL COMPUTING & APPLICATIONS, 2022,
[6] Formation Control With Collision Avoidance Through Deep Reinforcement Learning Using Model-Guided Demonstration
Sui, Zezhi
Pu, Zhiqiang
Yi, Jianqiang
Wu, Shiguang
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (06) : 2358 - 2372
[7] Deep Reinforcement Learning Based Link Adaptation Technique for LTE/NR Systems
Ye, Xiaowen
Yu, Yiding
Fu, Liqun
[J]. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023, 72 (06) : 7364 - 7379
[8] Representation Learning and Reinforcement Learning for Dynamic Complex Motion Planning System
Zhou, Chengmin
Huang, Bingding
Franti, Pasi
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (08) : 11049 - 11063

← 1 →