Learning from Good Trajectories in Offline Multi-Agent Reinforcement Learning

被引：0

作者：

Tian, Qi ^{[1
]}

Kuang, Kun ^{[1
]}

Liu, Furui ^{[2
]}

Wang, Baoxiang ^{[3
]}

机构：

[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China

[2] Huawei Noahs Ark Lab, Beijing, Peoples R China

[3] Chinese Univ Hong Kong Shenzhen, Sch Data Sci, Shenzhen, Peoples R China

来源：

THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 10 | 2023年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets, which is an important step toward the deployment of multi-agent systems in real-world applications. However, in practice, each individual behavior policy that generates multi-agent joint trajectories usually has a different level of how well it performs. e.g., an agent is a random policy while other agents are medium policies. In the cooperative game with global reward, one agent learned by existing offline MARL often inherits this random policy, jeopardizing the performance of the entire team. In this paper, we investigate offline MARL with explicit consideration on the diversity of agent-wise trajectories and propose a novel framework called Shared Individual Trajectories (SIT) to address this problem. Specifically, an attention-based reward decomposition network assigns the credit to each agent through a differentiable key-value memory mechanism in an offline manner. These decomposed credits are then used to reconstruct the joint offline datasets into prioritized experience replay with individual trajectories, thereafter agents can share their good trajectories and conservatively train their policies with a graph attention network (GAT) based critic. We evaluate our method in both discrete control (i.e., StarCraft II and multi-agent particle environment) and continuous control (i.e., multi-agent mujoco). The results indicate that our method achieves significantly better results in complex and mixed offline multiagent datasets, especially when the difference of data quality between individual trajectories is large.

引用

页码：11672 / 11680

页数：9

共 50 条

[1] Offline Multi-Agent Reinforcement Learning with Knowledge Distillation
Tseng, Wei-Cheng
Wang, Tsun-Hsuan
Yen-Chen, Lin
Isola, Phillip
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[2] Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning
Shao, Jianzhun
Qu, Yun
Chen, Chen
Zhang, Hongchang
Ji, Xiangyang
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[3] Offline Multi-Agent Reinforcement Learning in Custom Game Scenario
Shukla, Indu
Wilson, William R.
Henslee, Althea C.
Dozier, Haley R.
2023 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE, CSCI 2023, 2023, : 329 - 331
[4] Online Tuning for Offline Decentralized Multi-Agent Reinforcement Learning
Jiang, Jiechuan
Lu, Zongqing
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 7, 2023, : 8050 - +
[5] Enhancing collaboration in multi-agent reinforcement learning with correlated trajectories
Wang, Siying
Du, Hongfei
Zhou, Yang
Zhao, Zhitong
Zhang, Ruoning
Chen, Wenyu
KNOWLEDGE-BASED SYSTEMS, 2024, 305
[6] Reward-Poisoning Attacks on Offline Multi-Agent Reinforcement Learning
Wu, Young
McMahan, Jeremy
Zhu, Xiaojin
Xie, Qiaomin
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 9, 2023, : 10426 - 10434
[7] Multi-Agent Reinforcement Learning
Stankovic, Milos
2016 13TH SYMPOSIUM ON NEURAL NETWORKS AND APPLICATIONS (NEUREL), 2016, : 43 - 43
[8] Learning to Share in Multi-Agent Reinforcement Learning
Yi, Yuxuan
Li, Ge
Wang, Yaowei
Lu, Zongqing
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[9] Hierarchical multi-agent reinforcement learning
Mohammad Ghavamzadeh
Sridhar Mahadevan
Rajbala Makar
Autonomous Agents and Multi-Agent Systems, 2006, 13 : 197 - 229
[10] Multi-Agent Reinforcement Learning for Microgrids
Dimeas, A. L.
Hatziargyriou, N. D.
IEEE POWER AND ENERGY SOCIETY GENERAL MEETING 2010, 2010,

← 1 2 3 4 5 →