Efficient Communication via Self-Supervised Information Aggregation for Online and Offline Multiagent Reinforcement Learning

被引：0

作者：

Guan, Cong ^{[1
,2
]}

Chen, Feng ^{[1
,2
]}

Yuan, Lei ^{[3
]}

Zhang, Zongzhang ^{[1
,2
]}

Yu, Yang ^{[3
]}

机构：

[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Peoples R China

[2] Nanjing Univ, Sch Artificial Intelligence, Nanjing 210023, Peoples R China

[3] Polixir Technol, Nanjing 211106, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年

基金：

美国国家科学基金会;

关键词：

Benchmark testing; Reinforcement learning; Observability; Training; Learning (artificial intelligence); Decision making; Data mining; Cooperative multiagent reinforcement learning (MARL); multiagent communication; offline learning; representation learning;

D O I：

10.1109/TNNLS.2024.3420791

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Utilizing messages from teammates can improve coordination in cooperative multiagent reinforcement learning (MARL). Previous works typically combine raw messages of teammates with local information as inputs for policy. However, neglecting message aggregation poses significant inefficiency for policy learning. Motivated by recent advances in representation learning, we argue that efficient message aggregation is essential for good coordination in cooperative MARL. In this article, we propose Multiagent communication via Self-supervised Information Aggregation (MASIA), where agents can aggregate the received messages into compact representations with high relevance to augment the local policy. Specifically, we design a permutation-invariant message encoder to generate common information-aggregated representation from messages and optimize it via reconstructing and shooting future information in a self-supervised manner. Hence, each agent would utilize the most relevant parts of the aggregated representation for decision-making by a novel message extraction mechanism. Furthermore, considering the potential of offline learning for real-world applications, we build offline benchmarks for multiagent communication, which is the first as we know. Empirical results demonstrate the superiority of our method in both online and offline settings. We also release the built offline benchmarks in this article as a testbed for communication ability validation to facilitate further future research in this direction.

引用

页数：13

共 50 条

[1] Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning
Gao Z.
Xu K.
Zhai Y.
Ding B.
Feng D.
Mao X.
Wang H.
IEEE Transactions on Artificial Intelligence, 2024, 5 (11): : 1 - 10
[2] State Augmentation via Self-Supervision in Offline Multiagent Reinforcement Learning
Wang, Siying
Li, Xiaodie
Qu, Hong
Chen, Wenyu
IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (03) : 1051 - 1062
[3] Self-Supervised Discovering of Interpretable Features for Reinforcement Learning
Shi, Wenjie
Huang, Gao
Song, Shiji
Wang, Zhuoyuan
Lin, Tingyu
Wu, Cheng
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (05) : 2712 - 2724
[4] Visual Reinforcement Learning With Self-Supervised 3D Representations
Ze, Yanjie
Hansen, Nicklas
Chen, Yinbo
Jain, Mohit
Wang, Xiaolong
IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (05) : 2890 - 2897
[5] Self-Supervised Reinforcement Learning for Recommender Systems
Xin, Xin
Karatzoglou, Alexandros
Arapakis, Ioannis
Jose, Joemon M.
PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 931 - 940
[6] PointSmile: point self-supervised learning via curriculum mutual information
Li, Xin
Wei, Mingqiang
Chen, Songcan
SCIENCE CHINA-INFORMATION SCIENCES, 2024, 67 (11)
[7] Reinforcement Learning with Attention that Works: A Self-Supervised Approach
Manchin, Anthony
Abbasnejad, Ehsan
van den Hengel, Anton
NEURAL INFORMATION PROCESSING, ICONIP 2019, PT V, 2019, 1143 : 223 - 230
[8] Self-Supervised Reinforcement Learning for Active Object Detection
Fang, Fen
Liang, Wenyu
Wu, Yan
Xu, Qianli
Lim, Joo-Hwee
IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04): : 10224 - 10231
[9] Learning Aerial Docking via Offline-to-Online Reinforcement Learning
Tao, Yang
Feng Yuting
Yu, Yushu
2024 4TH INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL AND ROBOTICS, ICCCR 2024, 2024, : 305 - 309
[10] Multi-task Self-Supervised Adaptation for Reinforcement Learning
Wu, Keyu
Chen, Zhenghua
Wu, Min
Xiang, Shili
Jin, Ruibing
Zhang, Le
Li, Xiaoli
2022 IEEE 17TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2022, : 15 - 20

← 1 2 3 4 5 →