Coach-assisted multi-agent reinforcement learning framework for unexpected crashed agents

被引：6

作者：

ZHAO, Jian ^{[1
]}

ZHAO, Youpeng ^{[1
]}

WANG, Weixun ^{[2
]}

YANG, Mingyu ^{[1
]}

HU, Xunhan ^{[1
]}

ZHOU, Wengang ^{[1
]}

HAO, Jianye ^{[2
]}

LI, Houqiang ^{[1
]}

机构：

[1] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei 230026, Peoples R China

[2] Tianjin Univ, Coll Intelligence & Comp, Tianjin 300072, Peoples R China

来源：

FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING | 2022年 / 23卷 / 07期

基金：

中国国家自然科学基金;

关键词：

Multi-agent system; Reinforcement learning; Unexpected crashed agents; TP18;

D O I：

10.1631/FITEE.2100594

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multi-agent reinforcement learning is difficult to apply in practice, partially because of the gap between simulated and real-world scenarios. One reason for the gap is that simulated systems always assume that agents can work normally all the time, while in practice, one or more agents may unexpectedly "crash" during the coordination process due to inevitable hardware or software failures. Such crashes destroy the cooperation among agents and lead to performance degradation. In this work, we present a formal conceptualization of a cooperative multi-agent reinforcement learning system with unexpected crashes. To enhance the robustness of the system to crashes, we propose a coach-assisted multi-agent reinforcement learning framework that introduces a virtual coach agent to adjust the crash rate during training. We have designed three coaching strategies (fixed crash rate, curriculum learning, and adaptive crash rate) and a re-sampling strategy for our coach agent. To our knowledge, this work is the first to study unexpected crashes in a multi-agent system. Extensive experiments on grid-world and StarCraft II micromanagement tasks demonstrate the efficacy of the adaptive strategy compared with the fixed crash rate strategy and curriculum learning strategy. The ablation study further illustrates the effectiveness of our re-sampling strategy.

引用

页码：1032 / 1042

页数：11

共 34 条

[1]

Arndt K, 2020, IEEE INT CONF ROBOT, P2725, DOI [10.1109/ICRA40945.2020.9196540, 10.1109/icra40945.2020.9196540]

[2]

Botvinick Matthew, 2017, INT C MACHINE LEARNI

[3] A comprehensive survey of multiagent reinforcement learning [J].

Busoniu, Lucian ;

Babuska, Robert ;

De Schutter, Bart .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2008, 38 (02) :156-172

[4]

Dosovitskiy A, 2017, PR MACH LEARN RES, V78

[5]

Foerster JN, 2017, PR MACH LEARN RES, V70

[6]

Furrer F, 2016, STUD COMPUT INTELL, V625, P595, DOI 10.1007/978-3-319-26054-9_23

[7]

Guestrin C, 2002, ADV NEUR IN, V14, P1523

[8]

Kim D., 2019, P ICLR

[9]

Kok JR, 2006, J MACH LEARN RES, V7, P1789

[10] Multi-agent reinforcement learning as a rehearsal for decentralized planning [J].

Kraemer, Landon ;

Banerjee, Bikramjit .

NEUROCOMPUTING, 2016, 190 :82-94

← 1 2 3 4 →