End-to-End control of USV swarm using graph centric Multi-agent Reinforcement Learning

被引：2

作者：

Lee, Kanghoon ^{[1
]}

Ahn, Kyuree ^{[1
]}

Park, Jinkyoo ^{[1
]}

机构：

[1] Korea Adv Inst Sci & Technol, Dept Ind & Syst Engn, Daejeon, South Korea

来源：

2021 21ST INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2021) | 2021年

关键词：

USV swarm; Multi-agent reinforcement learning; Graph Neural Network;

D O I：

10.23919/ICCAS52745.2021.9649839

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The Unmanned Surface Vehicles (USVs), which operate without a person at the surface, are used in various naval defense missions. Various missions can be conducted efficiently when a swarm of USVs are operated at the same time. However, it is challenging to establish a decentralised control strategy for all USVs. In addition, the strategy must consider various external factors, such as the ocean topography and the number of enemy forces. These difficulties necessitate a scalable and transferable decision-making module. This study proposes an algorithm to derive the decentralised and cooperative control strategy for the USV swarm using graph centric multi-agent reinforcement learning (MARL). The model first expresses the mission situation using a graph considering the various sensor ranges. Each USV agent encodes observed information into localized embedding and then derives coordinated action through communication with the surrounding agent. To derive a cooperative policy, we trained each agent's policy to maximize the team reward. Using the modified prey-predator environment of OpenAI gym, we have analyzed the effect of each component of the proposed model (state embedding, communication, and team reward). The ablation study shows that the proposed model could derive a scalable and transferable control policy of USVs, consistently achieving the highest win ratio.

引用

页码：925 / 929

页数：5

共 8 条

[1]

Battaglia Peter W., 2018, Relational inductive biases, deep learning, and graph networks

[2] The complexity of decentralized control of Markov decision processes [J].

Bernstein, DS ;

Givan, R ;

Immerman, N ;

Zilberstein, S .

MATHEMATICS OF OPERATIONS RESEARCH, 2002, 27 (04) :819-840

[3]

Brockman Greg, 2016, arXiv

[4]

Hüttenrauch M, 2019, J MACH LEARN RES, V20

[5]

Kipf T. N., 2016, ARXIV

[6]

Lillicrap TP, 2015, Continuous control with deep reinforcement learning

[7]

Lowe R, 2017, ADV NEUR IN, V30

[8]

Velickovi P., 2017, P INT C LEARN REPR

← 1 →