Decentralized Multi-Agent Reinforcement Learning with Global State Prediction

被引：3

作者：

Bloom, Joshua ^{[1
]}

Paliwal, Pranjal ^{[1
]}

Mukherjee, Apratim ^{[1
]}

Pinciroli, Carlo ^{[1
]}

机构：

[1] Worcester Polytechn Inst, Robot Engn, Worcester, MA 01609 USA

来源：

2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS) | 2023年

关键词：

D O I：

10.1109/IROS55552.2023.10341563

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep reinforcement learning (DRL) has seen remarkable success in the control of single robots. However, applying DRL to robot swarms presents significant challenges. A critical challenge is non-stationarity, which occurs when two or more robots update individual or shared policies concurrently, thereby engaging in an interdependent training process with no guarantees of convergence. Circumventing non-stationarity typically involves training the robots with global information about other agents' states and/or actions. In contrast, in this paper we explore how to remove the need for global information. We pose our problem as a Partially Observable Markov Decision Process, due to the absence of global knowledge on other agents. Using collective transport as a testbed scenario, we study two approaches to multi-agent training. In the first, the robots exchange no messages, and are trained to rely on implicit communication through push-andpull on the object to transport. In the second approach, we introduce Global State Prediction (GSP), a network trained to form a belief over the swarm as a whole and predict its future states. We provide a comprehensive study over four well-known deep reinforcement learning algorithms in environments with obstacles, measuring performance as the successful transport of the object to a goal location within a desired time-frame. Through an ablation study, we show that including GSP boosts performance and increases robustness when compared with methods that use global knowledge.

引用

页码：8854 / 8861

页数：8

共 42 条

[1]

Ackermann J., 2019, ARXIV191001465CSSTAT

[2] Modeling and Planning with Macro-Actions in Decentralized POMDPs [J].

Amato, Christopher ;

Konidaris, George ;

Kaelbling, Leslie P. ;

How, Jonathan P. .

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2019, 64 :817-859

[3]

Baisero A., 2020, LEARNING COMPLEMENTA

[4]

Barnes L, 2007, MED C CONTR AUTOMAT, P1794

[5]

Bengio Y., 2009, P 26 ANN INT C MACHI, P41, DOI DOI 10.1145/1553374.1553380

[6] The MarXbot, a Miniature Mobile Robot Opening new Perspectives for the Collective-robotic Research [J].

Bonani, Michael ;

Longchamp, Valentin ;

Magnenat, Stephane ;

Retornaz, Philippe ;

Burnier, Daniel ;

Roulet, Gilles ;

Vaussard, Florian ;

Bleuler, Hannes ;

Mondada, Francesco .

IEEE/RSJ 2010 INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2010), 2010, :4187-4193

[7]

Foerster JN, 2016, ADV NEUR IN, V29

[8]

Fujimoto S, 2018, PR MACH LEARN RES, V80

[9] Multi-agent deep reinforcement learning: a survey [J].

Gronauer, Sven ;

Diepold, Klaus .

ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (02) :895-943

[10]

Hernandez-Leal P., 2019, ARXIV190709597CS

← 1 2 3 4 5 →