Learning from Good Trajectories in Offline Multi-Agent Reinforcement Learning

被引:0
|
作者
Tian, Qi [1 ]
Kuang, Kun [1 ]
Liu, Furui [2 ]
Wang, Baoxiang [3 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China
[2] Huawei Noahs Ark Lab, Beijing, Peoples R China
[3] Chinese Univ Hong Kong Shenzhen, Sch Data Sci, Shenzhen, Peoples R China
来源
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 10 | 2023年
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets, which is an important step toward the deployment of multi-agent systems in real-world applications. However, in practice, each individual behavior policy that generates multi-agent joint trajectories usually has a different level of how well it performs. e.g., an agent is a random policy while other agents are medium policies. In the cooperative game with global reward, one agent learned by existing offline MARL often inherits this random policy, jeopardizing the performance of the entire team. In this paper, we investigate offline MARL with explicit consideration on the diversity of agent-wise trajectories and propose a novel framework called Shared Individual Trajectories (SIT) to address this problem. Specifically, an attention-based reward decomposition network assigns the credit to each agent through a differentiable key-value memory mechanism in an offline manner. These decomposed credits are then used to reconstruct the joint offline datasets into prioritized experience replay with individual trajectories, thereafter agents can share their good trajectories and conservatively train their policies with a graph attention network (GAT) based critic. We evaluate our method in both discrete control (i.e., StarCraft II and multi-agent particle environment) and continuous control (i.e., multi-agent mujoco). The results indicate that our method achieves significantly better results in complex and mixed offline multiagent datasets, especially when the difference of data quality between individual trajectories is large.
引用
收藏
页码:11672 / 11680
页数:9
相关论文
共 50 条
  • [1] Offline Multi-Agent Reinforcement Learning with Knowledge Distillation
    Tseng, Wei-Cheng
    Wang, Tsun-Hsuan
    Yen-Chen, Lin
    Isola, Phillip
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [2] Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning
    Shao, Jianzhun
    Qu, Yun
    Chen, Chen
    Zhang, Hongchang
    Ji, Xiangyang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [3] Offline Multi-Agent Reinforcement Learning in Custom Game Scenario
    Shukla, Indu
    Wilson, William R.
    Henslee, Althea C.
    Dozier, Haley R.
    2023 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE, CSCI 2023, 2023, : 329 - 331
  • [4] Online Tuning for Offline Decentralized Multi-Agent Reinforcement Learning
    Jiang, Jiechuan
    Lu, Zongqing
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 7, 2023, : 8050 - +
  • [5] Enhancing collaboration in multi-agent reinforcement learning with correlated trajectories
    Wang, Siying
    Du, Hongfei
    Zhou, Yang
    Zhao, Zhitong
    Zhang, Ruoning
    Chen, Wenyu
    KNOWLEDGE-BASED SYSTEMS, 2024, 305
  • [6] Reward-Poisoning Attacks on Offline Multi-Agent Reinforcement Learning
    Wu, Young
    McMahan, Jeremy
    Zhu, Xiaojin
    Xie, Qiaomin
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 9, 2023, : 10426 - 10434
  • [7] Multi-Agent Reinforcement Learning
    Stankovic, Milos
    2016 13TH SYMPOSIUM ON NEURAL NETWORKS AND APPLICATIONS (NEUREL), 2016, : 43 - 43
  • [8] Learning to Share in Multi-Agent Reinforcement Learning
    Yi, Yuxuan
    Li, Ge
    Wang, Yaowei
    Lu, Zongqing
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [9] Hierarchical multi-agent reinforcement learning
    Mohammad Ghavamzadeh
    Sridhar Mahadevan
    Rajbala Makar
    Autonomous Agents and Multi-Agent Systems, 2006, 13 : 197 - 229
  • [10] Multi-Agent Reinforcement Learning for Microgrids
    Dimeas, A. L.
    Hatziargyriou, N. D.
    IEEE POWER AND ENERGY SOCIETY GENERAL MEETING 2010, 2010,