Learning from Good Trajectories in Offline Multi-Agent Reinforcement Learning

被引：0

作者：

Tian, Qi ^{[1
]}

Kuang, Kun ^{[1
]}

Liu, Furui ^{[2
]}

Wang, Baoxiang ^{[3
]}

机构：

[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China

[2] Huawei Noahs Ark Lab, Beijing, Peoples R China

[3] Chinese Univ Hong Kong Shenzhen, Sch Data Sci, Shenzhen, Peoples R China

来源：

THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 10 | 2023年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets, which is an important step toward the deployment of multi-agent systems in real-world applications. However, in practice, each individual behavior policy that generates multi-agent joint trajectories usually has a different level of how well it performs. e.g., an agent is a random policy while other agents are medium policies. In the cooperative game with global reward, one agent learned by existing offline MARL often inherits this random policy, jeopardizing the performance of the entire team. In this paper, we investigate offline MARL with explicit consideration on the diversity of agent-wise trajectories and propose a novel framework called Shared Individual Trajectories (SIT) to address this problem. Specifically, an attention-based reward decomposition network assigns the credit to each agent through a differentiable key-value memory mechanism in an offline manner. These decomposed credits are then used to reconstruct the joint offline datasets into prioritized experience replay with individual trajectories, thereafter agents can share their good trajectories and conservatively train their policies with a graph attention network (GAT) based critic. We evaluate our method in both discrete control (i.e., StarCraft II and multi-agent particle environment) and continuous control (i.e., multi-agent mujoco). The results indicate that our method achieves significantly better results in complex and mixed offline multiagent datasets, especially when the difference of data quality between individual trajectories is large.

引用

页码：11672 / 11680

页数：9

共 50 条

[21] Learning structured communication for multi-agent reinforcement learning [J].

Junjie Sheng ;

Xiangfeng Wang ;

Bo Jin ;

Junchi Yan ;

Wenhao Li ;

Tsung-Hui Chang ;

Jun Wang ;

Hongyuan Zha .

Autonomous Agents and Multi-Agent Systems, 2022, 36

[22] Concept Learning for Interpretable Multi-Agent Reinforcement Learning [J].

Zabounidis, Renos ;

Campbell, Joseph ;

Stepputtis, Simon ;

Hughes, Dana ;

Sycara, Katia .

CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 :1828-1837

[23] Learning structured communication for multi-agent reinforcement learning [J].

Sheng, Junjie ;

Wang, Xiangfeng ;

Jin, Bo ;

Yan, Junchi ;

Li, Wenhao ;

Chang, Tsung-Hui ;

Wang, Jun ;

Zha, Hongyuan .

AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2022, 36 (02)

[24] Generalized learning automata for multi-agent reinforcement learning [J].

De Hauwere, Yann-Michael ;

Vrancx, Peter ;

Nowe, Ann .

AI COMMUNICATIONS, 2010, 23 (04) :311-324

[25] Multi-Behavior Multi-Agent Reinforcement Learning for Informed Search via Offline Training [J].

Huang, Songjun ;

Sun, Chuanneng ;

Wang, Ruo-Qian ;

Pompili, Dario .

2024 20TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING IN SMART SYSTEMS AND THE INTERNET OF THINGS, DCOSS-IOT 2024, 2024, :19-26

[26] Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification [J].

Pan, Ling ;

Huang, Longbo ;

Ma, Tengyu ;

Xu, Huazhe .

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,

[27] Offline Multi-Agent Reinforcement Learning with Implicit Global-to-Local Value Regularization [J].

Wang, Xiangsen ;

Xu, Haoran ;

Zheng, Yinan ;

Zhan, Xianyuan .

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,

[28] Assessment of Multi-Agent Reinforcement Learning Strategies for Multi-Agent Negotiation [J].

Li, Hongyi ;

Ji, Ruihang ;

Ge, Shuzhi Sam .

2024 18TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION, ICARCV, 2024, :801-806

[29] Multi-Agent Uncertainty Sharing for Cooperative Multi-Agent Reinforcement Learning [J].

Chen, Hao ;

Yang, Guangkai ;

Zhang, Junge ;

Yin, Qiyue ;

Huang, Kaiqi .

2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,

[30] Multi-Agent Cognition Difference Reinforcement Learning for Multi-Agent Cooperation [J].

Wang, Huimu ;

Qiu, Tenghai ;

Liu, Zhen ;

Pu, Zhiqiang ;

Yi, Jianqiang ;

Yuan, Wanmai .

2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,

← 1 2 3 4 5 →