Research on Multi-aircraft Cooperative Air Combat Method Based on Deep Reinforcement Learning

被引:0
作者
Shi W. [1 ]
Feng Y.-H. [1 ]
Cheng G.-Q. [1 ]
Huang H.-L. [1 ]
Huang J.-C. [1 ]
Liu Z. [1 ]
He W. [2 ,3 ]
机构
[1] College of Systems Engineering, National University of Defense Technology, Changsha
[2] Institute of Artificial Intelligence, University of Science and Technology Beijing, Beijing
[3] School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing
来源
Feng, Yang-He (fengyanghe@nudt.edu.cn) | 1610年 / Science Press卷 / 47期
基金
中国国家自然科学基金;
关键词
Deep reinforcement learning; Enhancement mechanism; Intelligent decision; Multi-aircraft cooperative air combat; Proximal policy optimization (PPO) algorithm;
D O I
10.16383/j.aas.c201059
中图分类号
学科分类号
摘要
Multi-aircraft cooperation is the key part of air combat, and how to deal with the complex cooperation relationship between multi-entities is the essential problem to be solved urgently. In order to solve the problem of intelligent decision-making in multi-aircraft cooperative air combat, a deep-reinforcement-learning-based multi-aircraft cooperative air combat decision framework (DRL-MACACDF) is proposed in this paper. Based on proximal policy optimization (PPO), four algorithm enhancement mechanisms are designed to improve the synergistic degree of agents in multi-aircraft cooperative confrontation scenarios. The feasibility and practicability of the method are verified by the simulation on the wargame platform, and the interpretable review analysis of the antagonistic process data is carried out, and the cross research direction of the combination of reinforcement learning and traditional wargame deduction is discussed. Copyright © 2021 Acta Automatica Sinica. All rights reserved.
引用
收藏
页码:1610 / 1623
页数:13
相关论文
共 40 条
  • [1] Li Qing-Ying, Overview of collaborative air combat technology development and operational mode, Science and Technology and Innovation, pp. 124-126, (2020)
  • [2] Isaacs R., Differential Games: A Mathematical Theory With Applications to Warfare and Pursuit, Control and Optimization, (1999)
  • [3] Yan T, Cai Y, Bin X U., Evasion guidance algorithms for air-breathing hypersonic vehicles in three-player pursuit-evasion games, Chinese Journal of Aeronautics, 33, 12, pp. 3423-3436, (2020)
  • [4] Karelahti J, Virtanen K, Raivio T., Near-optimal missile avoidance trajectories via receding horizon control, Journal of Guidance Control and Dynamics, 30, 5, pp. 1287-1298, (2015)
  • [5] Oyler D W, Kabamba P T, Girard A R., Pursuit-evasion games in the presence of obstacles, Automatica, 65, pp. 1-11, (2016)
  • [6] Li W., The confinement-escape problem of a defender against an evader escaping from a circular region, IEEE Transactions on Cybernetics, 46, 4, pp. 1028-1039, (2016)
  • [7] Sun Q L, Shen M H, Gu X L, Hou K, Qi N M., Evasion-pursuit strategy against defended aircraft based on differential game theory, International Journal of Aerospace Engineering, 2019, pp. 1-12, (2019)
  • [8] Scott W L, Leonard N E., Optimal evasive strategies for multiple interacting agents with motion constraints, Automatica, 94, pp. 26-34, (2018)
  • [9] Shao Jiang, Xu Yang, Luo De-Lin, Cooperative combat decision-making research for multi UAVs, Information and Control, 47, pp. 347-354, (2018)
  • [10] Virtanen K, Karelahti J, Raivio T., Modeling air combat by a moving horizon influence diagram game, Journal of Guidance Control and Dynamics, 29, 5, pp. 1080-1091, (2006)