Enhanced missile hit probability actor-critic algorithm for autonomous decision-making in air-to-air confrontation

被引:3
作者
Chen, Can [1 ]
Mo, Li [1 ]
Lv, Maolong [2 ]
Lin, Defu [1 ]
Song, Tao [1 ]
Cao, Jinde [3 ,4 ]
机构
[1] Beijing Inst Technol, Sch Aerosp Engn, Beijing 100081, Peoples R China
[2] Air Force Engn Univ, Air Traff Control & Nav Sch, Xian 710051, Peoples R China
[3] Southeast Univ, Sch Math, Nanjing 211189, Peoples R China
[4] Ahlia Univ, Manama 10878, Bahrain
关键词
Air-to-air confrontation; Autonomous decision-making; Missile hit probability; Reinforcement learning; Actor-critic; COMBAT; FRAMEWORK;
D O I
10.1016/j.ast.2024.109285
中图分类号
V [航空、航天];
学科分类号
08 ; 0825 ;
摘要
In recent years, autonomous decision -making has emerged as a critical technology in air-to-air confrontation scenarios, garnering significant attention. This paper presents a novel AI algorithm, the Missile Hit Probability Enhanced Actor -Critic (MHPAC), designed for autonomous decision -making in such confrontations, whose primary objective is to maximize the probability of defeating opponents while minimizing the risk of being shot down. By incorporating a pre -trained Missile Hit Probability (MHP) model into reward shaping and exploration within the framework of Reinforcement Learning (RL), the MHPAC algorithm enhances the learning capabilities of the Actor -Critic (AC) algorithm specifically tailored for air-to-air confrontation scenarios. Furthermore, the MHP model is also integrated into the confrontation strategy to inform missile launch decisions. Using the MHPAC algorithm, the confrontation strategy is achieved via the training strategy of curriculum learning and self -play learning. Results demonstrate that the MHPAC algorithm effectively explores efficient maneuvering strategies for missile launch and defense, overcoming challenges associated with sparse and delayed reward signals. The decision -making capabilities of the integrated maneuvering and missile launch strategy are significantly enhanced by the proposed MHPAC algorithm, with a relative win ratio of over 65% against different strategies. Moreover, the trained strategy only needs 0.039 s for real-time decision -making. This research holds considerable promise for achieving air superiority and mission success in complex and dynamic aerial environments.
引用
收藏
页数:13
相关论文
共 41 条
[2]  
[Anonymous], About us
[3]   Deep Reinforcement Learning-Based Air-to-Air Combat Maneuver Generation in a Realistic Environment [J].
Bae, Jung Ho ;
Jung, Hoseong ;
Kim, Seogbong ;
Kim, Sungho ;
Kim, Yong-Duk .
IEEE ACCESS, 2023, 11 :26427-26440
[4]  
Bellemare MG, 2016, ADV NEUR IN, V29
[5]  
Berner C., 2019, arXiv, DOI DOI 10.48550/ARXIV.1912.06680
[6]   Feature selection in image analysis: a survey [J].
Bolon-Canedo, Veronica ;
Remeseiro, Beatriz .
ARTIFICIAL INTELLIGENCE REVIEW, 2020, 53 (04) :2905-2931
[7]  
Burda Y, 2018, Arxiv, DOI [arXiv:1810.12894, 10.48550/arXiv.1810.12894]
[8]   Adversarial Swarm Defence Using Multiple Fixed-Wing Unmanned Aerial Vehicles [J].
Choi, Joonwon ;
Seo, Minguk ;
Shin, Hyo-Sang ;
Oh, Hyondong .
IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2022, 58 (06) :5204-5219
[9]   Guidance and control for own aircraft in the autonomous air combat: A historical review and future prospects [J].
Dong, Yiqun ;
Ai, Jianliang ;
Liu, Jiquan .
PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART G-JOURNAL OF AEROSPACE ENGINEERING, 2019, 233 (16) :5943-5991
[10]  
Dongyuan H., 2020, Acta Aeronaut. Astronaut. Sin., V41