A sample selection mechanism for multi-UCAV air combat policy training using multi-agent reinforcement learning

被引：0

作者：

Yan, Zihui ^{[1
,2
]}

Liang, Xiaolong ^{[1
,2
]}

Hou, Yueqi ^{[1
,2
]}

Yang, Aiwu ^{[1
,2
]}

Zhang, Jiaqiang ^{[1
,2
]}

Wang, Ning ^{[1
,2
]}

机构：

[1] Air Force Engn Univ, Air Traff Control & Nav Sch, Xian 710051, Peoples R China

[2] Shaanxi Key Lab Meta Synth Elect & Informat Syst, Xian 710051, Peoples R China

来源：

CHINESE JOURNAL OF AERONAUTICS | 2025年 / 38卷 / 06期

关键词：

Unmanned combat aerial vehicle; Air combat; Sample selection; Multi-agent reinforcement learning; Policy proximal optimization; DECISION-MAKING;

D O I：

10.1016/j.cja.2024.103391

中图分类号：

V [航空、航天];

学科分类号：

08 ; 0825 ;

摘要：

Policy training against diverse opponents remains a challenge when using Multi-Agent Reinforcement Learning (MARL) in multiple Unmanned Combat Aerial Vehicle (UCAV) air combat scenarios. In view of this, this paper proposes a novel Dominant and Non-dominant strategy sample selection (DoNot) mechanism and a Local Observation Enhanced Multi-Agent Proximal Policy Optimization (LOE-MAPPO) algorithm to train the multi-UCAV air combat policy and improve its generalization. Specifically, the LOE-MAPPO algorithm adopts a mixed state that concatenates the global state and individual agent's local observation to enable efficient value function learning in multi-UCAV air combat. The DoNot mechanism classifies opponents into dominant or non-dominant strategy opponents, and samples from easier to more challenging opponents to form an adaptive training curriculum. Empirical results demonstrate that the proposed LOE-MAPPO algorithm outperforms baseline MARL algorithms in multi-UCAV air combat scenarios, and the DoNot mechanism leads to stronger policy generalization when facing diverse opponents. The results pave the way for the fast generation of cooperative strategies for air combat agents with MARL algorithms. (c) 2025 The Authors. Published by Elsevier Ltd on behalf of Chinese Society of Aeronautics and Astronautics. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/ licenses/by-nc-nd/4.0/).

引用

页数：16

共 45 条

[1]

Baker B, 2020, Arxiv, DOI arXiv:1909.07528

[2]

Berner C., DOTA 2 LARGE SCALE D, DOI DOI 10.48550/ARXIV.1912.06680

[3]

Chen JY, 2021, Arxiv, DOI arXiv:2111.04613

[4] Communication-Efficient Coordinated RSS-Based Distributed Passive Localization via Drone Cluster [J].

Cheng, Xin ;

Shi, Weiping ;

Cai, Wenlong ;

Zhu, Weiqiang ;

Shen, Tong ;

Shu, Feng ;

Wang, Jiangzhou .

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2022, 71 (01) :1072-1076

[5]

Christianos F., 2021, arXiv

[6]

Ernest N, 2016, J DEFEN MANAGE, V06, DOI [10.4172/2167-0374.1000144, 10.4172/2167-0374.1000144, DOI 10.4172/2167-0374.1000144]

[7] Cooperative Missile Guidance for Active Defense of Air Vehicles [J].

Garcia, Eloy ;

Casbeer, David W. ;

Fuchs, Zachariah E. ;

Pachter, Meir .

IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2018, 54 (02) :706-721

[8] Hierarchical Decision-Making Framework for Multiple UCAVs Autonomous Confrontation [J].

Hou, Yueqi ;

Liang, Xiaolong ;

Zhang, Jiaqiang ;

Lv, Maolong ;

Yang, Aiwu .

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023, 72 (11) :13953-13968

[9] Application of Deep Reinforcement Learning in Maneuver Planning of Beyond-Visual-Range Air Combat [J].

Hu, Dongyuan ;

Yang, Rennong ;

Zuo, Jialiang ;

Zhang, Ze ;

Wu, Jun ;

Wang, Ying .

IEEE ACCESS, 2021, 9 :32282-32297

[10]

Hu J, 2022, Arxiv, DOI arXiv:2102.03479

← 1 2 3 4 5 →