Event-Triggered Communication Network With Limited-Bandwidth Constraint for Multi-Agent Reinforcement Learning

被引：31

作者：

Hu, Guangzheng ^{[1
,2
]}

Zhu, Yuanheng ^{[1
,2
]}

Zhao, Dongbin ^{[1
,2
]}

Zhao, Mengchen ^{[3
]}

Hao, Jianye ^{[3
]}

机构：

[1] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China

[2] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China

[3] Huawei, Noahs Ark Lab, Beijing 100085, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2023年 / 34卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Bandwidth; Protocols; Reinforcement learning; Task analysis; Optimization; Communication networks; Multi-agent systems; Event trigger; limited bandwidth; multi-agent communication; multi-agent reinforcement learning (MARL); IMPROVING COORDINATION;

D O I：

10.1109/TNNLS.2021.3121546

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Communicating agents with each other in a distributed manner and behaving as a group are essential in multi-agent reinforcement learning. However, real-world multi-agent systems suffer from restrictions on limited bandwidth communication. If the bandwidth is fully occupied, some agents are not able to send messages promptly to others, causing decision delay and impairing cooperative effects. Recent related work has started to address the problem but still fails in maximally reducing the consumption of communication resources. In this article, we propose an event-triggered communication network (ETCNet) to enhance communication efficiency in multi-agent systems by communicating only when necessary. For different task requirements, two paradigms of the ETCNet framework, event-triggered sending network (ETSNet) and event-triggered receiving network (ETRNet), are proposed for learning efficient sending and receiving protocols, respectively. Leveraging the information theory, the limited bandwidth is translated to the penalty threshold of an event-triggered strategy, which determines whether an agent at each step participates in communication or not. Then, the design of the event-triggered strategy is formulated as a constrained Markov decision problem and reinforcement learning finds the feasible and optimal communication protocol that satisfies the limited bandwidth constraint. Experiments on typical multi-agent tasks demonstrate that ETCNet outperforms other methods in reducing bandwidth occupancy and still preserves the cooperative performance of multi-agent systems at the most.

引用

页码：3966 / 3978

页数：13

共 41 条

[1] Baumann D, 2018, IEEE DECIS CONTR P, P943, DOI 10.1109/CDC.2018.8619335
[2] DEEPCAS: A Deep Reinforcement Learning Algorithm for Control-Aware Scheduling
Demirel, Burak
Ramaswamy, Arunselvan
Quevedo, Daniel E.
Karl, Holger
[J]. IEEE CONTROL SYSTEMS LETTERS, 2018, 2 (04): : 737 - 742
[3] Foerster JN, 2016, ADV NEUR IN, V29
[4] Foerster JN, 2018, AAAI CONF ARTIF INTE, P2974
[5] Freeman, 2004, TELECOMMUNICATION SY
[6] Learning event-triggered control from data through joint optimization
Funk, Niklas
Baumann, Dominik
Berenz, Vincent
Trimpe, Sebastian
[J]. IFAC JOURNAL OF SYSTEMS AND CONTROL, 2021, 16
[7] Learning to Cooperate via an Attention-Based Communication Neural Network in Decentralized Multi-Robot Exploration
Geng, Mingyang
Xu, Kele
Zhou, Xing
Ding, Bo
Wang, Huaimin
Zhang, Lei
[J]. ENTROPY, 2019, 21 (03)
[8] THE PRINCIPLE OF MAXIMUM-ENTROPY
GUIASU, S
SHENITZER, A
[J]. MATHEMATICAL INTELLIGENCER, 1985, 7 (01) : 42 - 48
[9] Hernandez-Leal P., 2017, ARXIV170709183
[10] Jiang JC, 2018, ADV NEUR IN, V31

← 1 2 3 4 5 →