Learning from Good Trajectories in Offline Multi-Agent Reinforcement Learning

被引：0

作者：

Tian, Qi ^{[1
]}

Kuang, Kun ^{[1
]}

Liu, Furui ^{[2
]}

Wang, Baoxiang ^{[3
]}

机构：

[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China

[2] Huawei Noahs Ark Lab, Beijing, Peoples R China

[3] Chinese Univ Hong Kong Shenzhen, Sch Data Sci, Shenzhen, Peoples R China

来源：

THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 10 | 2023年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets, which is an important step toward the deployment of multi-agent systems in real-world applications. However, in practice, each individual behavior policy that generates multi-agent joint trajectories usually has a different level of how well it performs. e.g., an agent is a random policy while other agents are medium policies. In the cooperative game with global reward, one agent learned by existing offline MARL often inherits this random policy, jeopardizing the performance of the entire team. In this paper, we investigate offline MARL with explicit consideration on the diversity of agent-wise trajectories and propose a novel framework called Shared Individual Trajectories (SIT) to address this problem. Specifically, an attention-based reward decomposition network assigns the credit to each agent through a differentiable key-value memory mechanism in an offline manner. These decomposed credits are then used to reconstruct the joint offline datasets into prioritized experience replay with individual trajectories, thereafter agents can share their good trajectories and conservatively train their policies with a graph attention network (GAT) based critic. We evaluate our method in both discrete control (i.e., StarCraft II and multi-agent particle environment) and continuous control (i.e., multi-agent mujoco). The results indicate that our method achieves significantly better results in complex and mixed offline multiagent datasets, especially when the difference of data quality between individual trajectories is large.

引用

页码：11672 / 11680

页数：9

共 50 条

[31] Multi-Agent Uncertainty Sharing for Cooperative Multi-Agent Reinforcement Learning [J].

Chen, Hao ;

Yang, Guangkai ;

Zhang, Junge ;

Yin, Qiyue ;

Huang, Kaiqi .

2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,

[32] Multi-Agent Cognition Difference Reinforcement Learning for Multi-Agent Cooperation [J].

Wang, Huimu ;

Qiu, Tenghai ;

Liu, Zhen ;

Pu, Zhiqiang ;

Yi, Jianqiang ;

Yuan, Wanmai .

2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,

[33] Multi-Agent Reinforcement Learning With Distributed Targeted Multi-Agent Communication [J].

Xu, Chi ;

Zhang, Hui ;

Zhang, Ya .

2023 35TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2023, :2915-2920

[34] Cooperative Learning of Multi-Agent Systems Via Reinforcement Learning [J].

Wang, Xin ;

Zhao, Chen ;

Huang, Tingwen ;

Chakrabarti, Prasun ;

Kurths, Juergen .

IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, 2023, 9 :13-23

[35] Learning to Communicate for Mobile Sensing with Multi-agent Reinforcement Learning [J].

Zhang, Bolei ;

Liu, Junliang ;

Xiao, Fu .

WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS, WASA 2021, PT II, 2021, 12938 :612-623

[36] Learning Cooperative Intrinsic Motivation in Multi-Agent Reinforcement Learning [J].

Hong, Seung-Jin ;

Lee, Sang-Kwang .

12TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE (ICTC 2021): BEYOND THE PANDEMIC ERA WITH ICT CONVERGENCE INNOVATION, 2021, :1697-1699

[37] Emergent Social Learning via Multi-agent Reinforcement Learning [J].

Ndousse, Kamal ;

Eck, Douglas ;

Levine, Sergey ;

Jaques, Natasha .

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139

[38] Multi-agent cooperative learning research based on reinforcement learning [J].

Liu, Fei ;

Zeng, Guangzhou .

2006 10TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, PROCEEDINGS, VOLS 1 AND 2, 2006, :1408-1413

[39] Learning of Communication Codes in Multi-Agent Reinforcement Learning Problem [J].

Kasai, Tatsuya ;

Tenmoto, Hiroshi ;

Kamiya, Akimoto .

2008 IEEE CONFERENCE ON SOFT COMPUTING IN INDUSTRIAL APPLICATIONS SMCIA/08, 2009, :1-+

[40] A Transfer Learning Framework for Deep Multi-Agent Reinforcement Learning [J].

Yi Liu ;

Xiang Wu ;

Yuming Bo ;

Jiacun Wang ;

Lifeng Ma .

IEEE/CAA Journal of Automatica Sinica, 2024, 11 (11) :2346-2348

← 1 2 3 4 5 →