Decentralized Multi-Agent Reinforcement Learning with Global State Prediction

被引:3
作者
Bloom, Joshua [1 ]
Paliwal, Pranjal [1 ]
Mukherjee, Apratim [1 ]
Pinciroli, Carlo [1 ]
机构
[1] Worcester Polytechn Inst, Robot Engn, Worcester, MA 01609 USA
来源
2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS) | 2023年
关键词
D O I
10.1109/IROS55552.2023.10341563
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep reinforcement learning (DRL) has seen remarkable success in the control of single robots. However, applying DRL to robot swarms presents significant challenges. A critical challenge is non-stationarity, which occurs when two or more robots update individual or shared policies concurrently, thereby engaging in an interdependent training process with no guarantees of convergence. Circumventing non-stationarity typically involves training the robots with global information about other agents' states and/or actions. In contrast, in this paper we explore how to remove the need for global information. We pose our problem as a Partially Observable Markov Decision Process, due to the absence of global knowledge on other agents. Using collective transport as a testbed scenario, we study two approaches to multi-agent training. In the first, the robots exchange no messages, and are trained to rely on implicit communication through push-andpull on the object to transport. In the second approach, we introduce Global State Prediction (GSP), a network trained to form a belief over the swarm as a whole and predict its future states. We provide a comprehensive study over four well-known deep reinforcement learning algorithms in environments with obstacles, measuring performance as the successful transport of the object to a goal location within a desired time-frame. Through an ablation study, we show that including GSP boosts performance and increases robustness when compared with methods that use global knowledge.
引用
收藏
页码:8854 / 8861
页数:8
相关论文
共 42 条
[1]  
Ackermann J., 2019, ARXIV191001465CSSTAT
[2]   Modeling and Planning with Macro-Actions in Decentralized POMDPs [J].
Amato, Christopher ;
Konidaris, George ;
Kaelbling, Leslie P. ;
How, Jonathan P. .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2019, 64 :817-859
[3]  
Baisero A., 2020, LEARNING COMPLEMENTA
[4]  
Barnes L, 2007, MED C CONTR AUTOMAT, P1794
[5]  
Bengio Y., 2009, P 26 ANN INT C MACHI, P41, DOI DOI 10.1145/1553374.1553380
[6]   The MarXbot, a Miniature Mobile Robot Opening new Perspectives for the Collective-robotic Research [J].
Bonani, Michael ;
Longchamp, Valentin ;
Magnenat, Stephane ;
Retornaz, Philippe ;
Burnier, Daniel ;
Roulet, Gilles ;
Vaussard, Florian ;
Bleuler, Hannes ;
Mondada, Francesco .
IEEE/RSJ 2010 INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2010), 2010, :4187-4193
[7]  
Foerster JN, 2016, ADV NEUR IN, V29
[8]  
Fujimoto S, 2018, PR MACH LEARN RES, V80
[9]   Multi-agent deep reinforcement learning: a survey [J].
Gronauer, Sven ;
Diepold, Klaus .
ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (02) :895-943
[10]  
Hernandez-Leal P., 2019, ARXIV190709597CS