Tactical Reward Shaping for Large-Scale Combat by Multi-Agent Reinforcement Learning

被引：1

作者：

Duo, Nanxun ^{[1
]}

Wang, Qinzhao ^{[1
]}

Lyu, Qiang ^{[2
]}

Wang, Wei ^{[3
]}

机构：

[1] Acad Army Armored Forces, Dept Weap & Control, Beijing 100072, Peoples R China

[2] Beijing South Technol Co Ltd, Beijing 100176, Peoples R China

[3] Beijing Special Vehicle Inst, Beijing 100072, Peoples R China

来源：

JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS | 2024年 / 35卷 / 06期

关键词：

deep reinforcement learning; multi-agent combat; multi-agent reinforce-ment learning; unmanned battle; rewardshaping;

D O I：

10.23919/JSEE.2024.000062

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Future unmanned battles desperately require intelligent combat policies, and multi-agent reinforcement learning offers a promising solution. However, due to the complexity of combat operations and large size of the combat group, this task suffers from credit assignment problem more than other reinforcement learning tasks. This study uses reward shaping to relieve the credit assignment problem and improve policy training for the new generation of large-scale unmanned combat operations. We first prove that multiple reward shaping functions would not change the Nash Equilibrium in stochastic games, providing theoretical support for their use. According to the characteristics of combat operations, we propose tactical reward shaping (TRS) that comprises maneuver shaping advice and threat assessment-based attack shaping advice. Then, we investigate the effects of different types and combinations of shaping advice on combat policies through experiments. The results show that TRS improves both the efficiency and attack accuracy of combat policies, with the combination of maneuver reward shaping advice and ally-focused attack shaping advice achieving the best performance compared with that of the baseline strategy.

引用

页码：1516 / 1529

页数：14

共 31 条

[1] Analyzing and visualizing multiagent rewards in dynamic and stochastic domains [J].

Agogino, Adrian K. ;

Tumer, Kagan .

AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2008, 17 (02) :320-338

[2]

[Anonymous], 2008, AAAI

[3] A new Potential-Based Reward Shaping for Reinforcement Learning Agent [J].

Badnava, Babak ;

Esmaeili, Mona ;

Mozayani, Nasser ;

Zarkesh-Ha, Payman .

2023 IEEE 13TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE, CCWC, 2023, :630-635

[4]

Brockman G, 2016, Arxiv, DOI arXiv:1606.01540

[5]

Brys T, 2015, PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), P3352

[6] A comprehensive survey of multiagent reinforcement learning [J].

Busoniu, Lucian ;

Babuska, Robert ;

De Schutter, Bart .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2008, 38 (02) :156-172

[7]

Chang YH, 2004, ADV NEUR IN, V16, P807

[8] A multi-agent architecture for modelling and simulation of small military unit combat in asymmetric warfare [J].

Cil, Ibrahim ;

Mala, Murat .

EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (02) :1331-1343

[9]

Devlin S., 2011, 10 INT C AUT AG MULT, P225

[10]

Devlin S, 2014, AAMAS'14: PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, P165

← 1 2 3 4 →