An air combat maneuver decision-making approach using coupled reward in deep reinforcement learning

被引：0

作者：

Yang, Jian ^{[1
]}

Wang, Liangpei ^{[1
]}

Han, Jiale ^{[1
]}

Chen, Changdi ^{[1
]}

Yuan, Yinlong ^{[3
]}

Yu, Zhu Liang ^{[1
]}

Yang, Guoli ^{[2
]}

机构：

[1] South China Univ Technol, Coll Automat Sci & Engn, Guangzhou 510641, Peoples R China

[2] Adv Inst Big Data, Dept Big Data Intelligence, Beijing 100195, Peoples R China

[3] Nantong Univ, Sch Elect Engn, Nantong 226019, Peoples R China

来源：

COMPLEX & INTELLIGENT SYSTEMS | 2025年 / 11卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Air combat; Maneuver decision-making; Deep reinforcement learning(DRL); Coupled reward;

D O I：

10.1007/s40747-025-01992-9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the domain of unmanned air combat, achieving efficient autonomous maneuvering decisions presents challenges. Deep Reinforcement learning(DRL) is one of the approaches to tackle this problem. The final performance of the DRL algorithm is directly affected by the design of the reward functions. However, the performance and convergence speed of the models suffer from unreasonable reward weights. Therefore, a method named Coupled Reward-Deep Reinforcement Learning(CR-DRL) is introduced to deal with this problem. Specifically, we propose a novel coupled-weight reward function for DRL within the air combat framework. The novel reward function integrates angle and distance so that our DRL maneuver decision model can be trained faster and perform better compared to that of the models use conventional reward functions. Additionally, we establish a brand new competitive training framework designed to enhance the performance of our model against personalized opponents. The experimental results show that our CR-DRL model outperforms the traditional model that uses the fixed-weight reward functions in this training framework, with a 6.3% increase in average reward in fixed scenarios and a 22.8% increase in changeable scenarios. Moreover, the performance of our model continually improves with the increase of iterations, ultimately yielding a certain degree of generalization performance against similar opponents. Finally, we develop a simulation environment that supports real-time air combat based on Unity3D, called Airfightsim, to demonstrate the performance of the proposed algorithm.

引用

页数：17

共 30 条

[1]

Amin K, 2017, ADV NEUR IN, V30

[2]

Austin F., 1987, GUIDANCE NAVIGATION, DOI [10.2514/6.1987-2393, DOI 10.2514/6.1987-2393]

[3] A Hierarchical Deep Reinforcement Learning Framework for 6-DOF UCAV Air-to-Air Combat [J].

Chai, Jiajun ;

Chen, Wenzhang ;

Zhu, Yuanheng ;

Yao, Zong-Xin ;

Zhao, Dongbin .

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2023, 53 (09) :5417-5429

[4] Uncertain pursuit-evasion game [J].

Feng, Yanghe ;

Dai, Lanruo ;

Gao, Jinwu ;

Cheng, Guangquan .

SOFT COMPUTING, 2020, 24 (04) :2425-2429

[5] An Introduction to Deep Reinforcement Learning [J].

Francois-Lavet, Vincent ;

Henderson, Peter ;

Islam, Riashat ;

Bellemare, Marc G. ;

Pineau, Joelle .

FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2018, 11 (3-4) :219-354

[6]

Haarnoja T, 2018, PR MACH LEARN RES, V80

[7] Curriculum-RL Based Air Combat Decision-Making [J].

He, Yuhang ;

Yang, Dapeng ;

Zhang, Man ;

Li, Yan .

ADVANCES IN GUIDANCE, NAVIGATION AND CONTROL, 2023, 845 :4611-4621

[8] Parallel learner: A practical deep reinforcement learning framework for multi-scenario games [J].

Hou, Xiaohan ;

Guo, Zhenyang ;

Wang, Xuan ;

Qian, Tao ;

Zhang, Jiajia ;

Qi, Shuhan ;

Xiao, Jing .

KNOWLEDGE-BASED SYSTEMS, 2022, 236

[9] Aerial combat maneuvering policy learning based on confrontation demonstrations and dynamic quality replay [J].

Hu, Dongyuan ;

Yang, Rennong ;

Zhang, Ying ;

Yue, Longfei ;

Yan, Mengda ;

Zuo, Jialiang ;

Zhao, Xiaoru .

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 111

[10] Autonomous air combat maneuver decision using Bayesian inference and moving horizon optimization [J].

Huang Changqiang ;

Dong Kangsheng ;

Huang Hanqiao ;

Tang Shangqin ;

Zhang Zhuoran .

JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2018, 29 (01) :86-97

← 1 2 3 →