Multi-agent deep reinforcement learning with type-based hierarchical group communication

被引：13

作者：

Jiang, Hao ^{[1
]}

Shi, Dianxi ^{[2
,3
]}

Xue, Chao ^{[2
,3
]}

Wang, Yajie ^{[1
]}

Wang, Gongju ^{[2
]}

Zhang, Yongjun ^{[2
]}

机构：

[1] Natl Univ Def Technol, Coll Comp, Changsha, Peoples R China

[2] Natl Innovat Inst Def Technol, Artificial Intelligence Res Ctr, Beijing, Peoples R China

[3] Tianjin Artificial Intelligence Innovat Ctr, Tianjin, Peoples R China

来源：

APPLIED INTELLIGENCE | 2021年 / 51卷 / 08期

基金：

中国博士后科学基金;

关键词：

Multi-agent reinforcement learning; Group cognitive consistency; Group communication; Value decomposition;

D O I：

10.1007/s10489-020-02065-9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Real-world multi-agent tasks often involve varying types and quantities of agents. These agents connected by complex interaction relationships causes great difficulty for policy learning because they need to learn various interaction types to complete a given task. Therefore, simplifying the learning process is an important issue. In multi-agent systems, agents with a similar type often interact more with each other and exhibit behaviors more similar. That means there are stronger collaborations between these agents. Most existing multi-agent reinforcement learning (MARL) algorithms expect to learn the collaborative strategies of all agents directly in order to maximize the common rewards. This causes the difficulty of policy learning to increase exponentially as the number and types of agents increase. To address this problem, we propose a type-based hierarchical group communication (THGC) model. This model uses prior domain knowledge or predefine rule to group agents, and maintains the group's cognitive consistency through knowledge sharing. Subsequently, we introduce a group communication and value decomposition method to ensure cooperation between the various groups. Experiments demonstrate that our model outperforms state-of-the-art MARL methods on the widely adopted StarCraft II benchmarks across different scenarios, and also possesses potential value for large-scale real-world applications.

引用

页码：5793 / 5808

页数：16

共 49 条

[41]

Sutton RS, 2000, ADV NEUR IN, V12, P1057

[42]

Velickovi P., 2017, P INT C LEARN REPR

[43]

Wang WX, 2020, AAAI CONF ARTIF INTE, V34, P7293

[44] BACKPROPAGATION THROUGH TIME - WHAT IT DOES AND HOW TO DO IT [J].

WERBOS, PJ .

PROCEEDINGS OF THE IEEE, 1990, 78 (10) :1550-1560

[45]

Whiteson, 2018, QMIX MONOTONIC VALUE

[46] The Gaia methodology for agent-oriented analysis and design [J].

Wooldridge, M ;

Jennings, NR ;

Kinny, D .

AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2000, 3 (03) :285-312

[47]

Yang YD, 2018, PR MACH LEARN RES, V80

[48] Multiagent Learning of Coordination in Loosely Coupled Multiagent Systems [J].

Yu, Chao ;

Zhang, Minjie ;

Ren, Fenghui ;

Tan, Guozhen .

IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (12) :2853-2867

[49]

Zhang Z., 2019, Integrating independent and centralized multi-agent reinforcement learning for traffic signal network optimization

← 1 2 3 4 5 →