Autonomous Dogfight Decision-Making for Air Combat Based on Reinforcement Learning with Automatic Opponent Sampling

被引：0

作者：

Chen, Can ^{[1
]}

Song, Tao ^{[1
]}

Mo, Li ^{[1
]}

Lv, Maolong ^{[2
]}

Lin, Defu ^{[1
]}

机构：

[1] Beijing Inst Technol, Sch Aerosp Engn, Beijing 100081, Peoples R China

[2] Air Force Engn Univ, Air Traff Control & Nav Sch, Xian 710051, Peoples R China

来源：

AEROSPACE | 2025年 / 12卷 / 03期

基金：

中国国家自然科学基金;

关键词：

air combat; dogfight; autonomous decision-making; reinforcement learning; automatic opponent sampling; proximal policy optimization;

D O I：

10.3390/aerospace12030265

中图分类号：

V [航空、航天];

学科分类号：

08 ; 0825 ;

摘要：

The field of autonomous air combat has witnessed a surge in interest propelled by the rapid progress of artificial intelligence technology. A persistent challenge within this domain pertains to autonomous decision-making for dogfighting, especially when dealing with intricate, high-fidelity nonlinear aircraft dynamic models and insufficient information. In response to this challenge, this paper introduces reinforcement learning (RL) to train maneuvering strategies. In the context of RL for dogfighting, the method by which opponents are sampled assumes significance in determining the efficacy of training. Consequently, this paper proposes a novel automatic opponent sampling (AOS)-based RL framework where proximal policy optimization (PPO) is applied. This approach encompasses three pivotal components: a phased opponent policy pool with simulated annealing (SA)-inspired curriculum learning, an SA-inspired Boltzmann Meta-Solver, and a Gate Function based on the sliding window. The training outcomes demonstrate that this improved PPO algorithm with an AOS framework outperforms existing reinforcement learning methods such as the soft actor-critic (SAC) algorithm and the PPO algorithm with prioritized fictitious self-play (PFSP). Moreover, during testing scenarios, the trained maneuvering policy displays remarkable adaptability when confronted with a diverse array of opponents. This research signifies a substantial stride towards the realization of robust autonomous maneuvering decision systems in the context of modern air combat.

引用

页数：24

共 43 条

[1] REAL-TIME CLOSED-LOOP SOLUTION METHOD FOR A CLASS OF NONLINEAR DIFFERENTIAL GAMES [J].

ANDERSON, GM .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1972, AC17 (04) :576-&

[2] Deep Reinforcement Learning-Based Air-to-Air Combat Maneuver Generation in a Realistic Environment [J].

Bae, Jung Ho ;

Jung, Hoseong ;

Kim, Seogbong ;

Kim, Sungho ;

Kim, Yong-Duk .

IEEE ACCESS, 2023, 11 :26427-26440

[3]

Baker B., 2020, P INT C LEARN REPR A

[4]

Bansal T, 2018, Arxiv, DOI arXiv:1710.03748

[5] Brown's original fictitious play [J].

Berger, Ulrich .

JOURNAL OF ECONOMIC THEORY, 2007, 135 (01) :572-578

[6]

Berner C., 2019, Dota 2 with Large Scale Deep Reinforcement Learning

[7] Adversarial Swarm Defence Using Multiple Fixed-Wing Unmanned Aerial Vehicles [J].

Choi, Joonwon ;

Seo, Minguk ;

Shin, Hyo-Sang ;

Oh, Hyondong .

IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2022, 58 (06) :5204-5219

[8] Guidance and control for own aircraft in the autonomous air combat: A historical review and future prospects [J].

Dong, Yiqun ;

Ai, Jianliang ;

Liu, Jiquan .

PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART G-JOURNAL OF AEROSPACE ENGINEERING, 2019, 233 (16) :5943-5991

[9] Autonomous Maneuver Decision for Unmanned Aerial Vehicle via Improved Pigeon-Inspired Optimization [J].

Duan, Haibin ;

Lei, Yangqi ;

Xia, Jie ;

Deng, Yimin ;

Shi, Yuhui .

IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2023, 59 (03) :3156-3170

[10] Modeling and Passivity-Based Control for a convertible fixed-wing VTOL [J].

Duran-Delfin, J. E. ;

Garcia-Beltran, C. D. ;

Guerrero-Sanchez, M. E. ;

Valencia-Palomo, G. ;

Hernandez-Gonzalez, O. .

APPLIED MATHEMATICS AND COMPUTATION, 2024, 461

← 1 2 3 4 5 →