A Hierarchical Deep Reinforcement Learning Framework for 6-DOF UCAV Air-to-Air Combat

被引:35
作者
Chai, Jiajun [1 ,2 ]
Chen, Wenzhang [1 ,2 ]
Zhu, Yuanheng [1 ,2 ]
Yao, Zong-Xin [3 ]
Zhao, Dongbin [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence S, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
[3] Shenyang Aircraft Design & Res Inst, Dept Unmanned Aerial Vehicle, Shenyang 110035, Peoples R China
来源
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS | 2023年 / 53卷 / 09期
基金
中国国家自然科学基金;
关键词
Aircraft; Aerospace control; 6-DOF; Task analysis; Nose; Missiles; Heuristic algorithms; 6-DOF unmanned combat air vehicle (UCAV); air combat; hierarchical structure; reinforcement learning (RL); self-play; LEVEL; GAME;
D O I
10.1109/TSMC.2023.3270444
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Unmanned combat air vehicle (UCAV) combat is a challenging scenario with high-dimensional continuous state and action space and highly nonlinear dynamics. In this article, we propose a general hierarchical framework to resolve the within-vision-range (WVR) air-to-air combat problem under six dimensions of degree (6-DOF) dynamics. The core idea is to divide the whole decision-making process into two loops and use reinforcement learning (RL) to solve them separately. The outer loop uses a combat policy to decide the macro command according to the current combat situation. Then the inner loop uses a control policy to answer the macro command by calculating the actual input signals for the aircraft. We design the Markov decision-making process for the control policy and the Markov game between two aircraft. We present a two-stage training mechanism. For the control policy, we design an effective reward function to accurately track various macro behaviors. For the combat policy, we present a fictitious self-play mechanism to improve the combat performance by combating against the historical combat policies. Experiment results show that the control policy can achieve better tracking performance than conventional methods. The fictitious self-play mechanism can learn competitive combat policy, which can achieve high winning rates against conventional methods.
引用
收藏
页码:5417 / 5429
页数:13
相关论文
共 50 条
[31]   Maneuver Strategy Generation of UCAV for within Visual Range Air Combat Based on Multi-Agent Reinforcement Learning and Target Position Prediction [J].
Kong, Weiren ;
Zhou, Deyun ;
Yang, Zhen ;
Zhang, Kai ;
Zeng, Lina .
APPLIED SCIENCES-BASEL, 2020, 10 (15)
[32]   Autonomous Air Combat with Reinforcement Learning under Different Noise Conditions [J].
Tasbas, Ahmet Semih ;
Serbest, Sanberk ;
Sahin, Safa Onur ;
Ure, Nazim Kemal .
2023 31ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2023,
[33]   Autonomous Maneuver Decision Making of Dual-UAV Cooperative Air Combat Based on Deep Reinforcement Learning [J].
Hu, Jinwen ;
Wang, Luhe ;
Hu, Tianmi ;
Guo, Chubing ;
Wang, Yanxiong .
ELECTRONICS, 2022, 11 (03)
[34]   A hierarchical reinforcement learning framework for multi-UAV combat using leader-follower strategy [J].
Pang, Jinhui ;
He, Jinglin ;
Mohamed, Noureldin Mohamed Abdelaal Ahmed ;
Lin, Changqing ;
Zhang, Zhihui ;
Hao, Xiaoshuai .
KNOWLEDGE-BASED SYSTEMS, 2025, 316
[35]   2-D Air Combat Maneuver Decision Using Reinforcement Learning [J].
Tasbas, Ahmet Semih ;
Aydinli, Sevket Utku .
2021 7TH INTERNATIONAL CONFERENCE ON ENGINEERING AND EMERGING TECHNOLOGIES (ICEET 2021), 2021, :740-745
[36]   Leveraging JS']JSBSim and Gymnasium: A Reinforcement Learning Approach for Air Combat Simulation [J].
Salhi, Abderahim ;
Jabour, Joseph E. ;
Arnolds, Thomas L. ;
Ross, James E. ;
Dozier, Haley R. .
APPLIED COGNITIVE COMPUTING AND ARTIFICIAL INTELLIGENCE, ACC 2024, ICAI 2024, 2025, 2251 :271-283
[37]   Cross coordination of behavior clone and reinforcement learning for autonomous within-visual-range air combat [J].
Li, Lun ;
Zhang, Xuebo ;
Qian, Chenxu ;
Zhao, Minghui ;
Wang, Runhua .
NEUROCOMPUTING, 2024, 584
[38]   An Empirical Study of Reward Structures for Actor-Critic Reinforcement Learning in Air Combat Manoeuvring Simulation [J].
Kurniawan, Budi ;
Vamplew, Peter ;
Papasimeon, Michael ;
Dazeley, Richard ;
Foale, Cameron .
AI 2019: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, 11919 :54-65
[39]   SD2AC: A reinforcement learning framework using distribution evaluation and sequential decision-making for UCAV combat [J].
Yang, Tao ;
Shi, Xinhao ;
Xu, Cheng ;
Yang, Yulin ;
Liu, Hongzhe ;
Zeng, Qinghan .
JOURNAL OF COMPUTATIONAL DESIGN AND ENGINEERING, 2025, 12 (07) :96-112
[40]   2v2 Air Combat Confrontation Strategy Based on Reinforcement Learning [J].
Wang, Jinlin ;
Zhu, Longtao ;
Yang, Hongyu ;
Ji, Yulong ;
Wang, Xiaoming .
PROCEEDINGS OF 2022 INTERNATIONAL CONFERENCE ON AUTONOMOUS UNMANNED SYSTEMS, ICAUS 2022, 2023, 1010 :1368-1377