Policy distillation for efficient decentralized execution in multi-agent reinforcement learning

被引:0
作者
Pei, Yuhang [1 ]
Ren, Tao [1 ]
Zhang, Yuxiang [1 ]
Sun, Zhipeng [1 ]
Champeyrol, Matys [2 ]
机构
[1] Northeastern Univ, Software Coll, Shenyang 110169, Liaoning, Peoples R China
[2] CY Tech, 2 Blvd Lucien Favre CS 77563, F-64075 Pau, Pyrenees Atlant, France
关键词
Multi-agent reinforcement learning; Transformers; Distillation; Computational efficiency;
D O I
10.1016/j.neucom.2025.129617
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cooperative Multi-Agent Reinforcement Learning (MARL) addresses complex scenarios where multiple agents collaborate to achieve shared objectives. Training these agents in partially observable environments within the Centralized Training with Decentralized Execution (CTDE) framework remains challenging due to limited information access and the need for lightweight agent networks. To overcome these challenges, we introduce the Centralized Training and Policy Distillation for Decentralized Execution (CTPDE) framework. We propose a centralized dual-attention agent network that integrates global state and local observations to enable lossless value decomposition and prevent homogeneous agent behaviors. In addition, an efficient policy distillation method is proposed, in which a network of action value distribution is distilled from the centralized agent network, ensuring the efficiency of decentralized execution. The evaluation of CTPDE in benchmark environments demonstrates that the attention-based network achieves state-of-the-art performance during training. Moreover, the distilled agent network surpasses existing RNN-based methods and, in some cases, matches the capabilities of more complex architectures. These findings underscore the potential of CTPDE for advancing cooperative MARL tasks.
引用
收藏
页数:13
相关论文
共 44 条
  • [1] Rusu AA, 2016, Arxiv, DOI arXiv:1511.06295
  • [2] Difference rewards policy gradients
    Castellini, Jacopo
    Devlin, Sam
    Oliehoek, Frans A.
    Savani, Rahul
    [J]. NEURAL COMPUTING & APPLICATIONS, 2022,
  • [3] Design and Experimental Validation of Deep Reinforcement Learning-Based Fast Trajectory Planning and Control for Mobile Robot in Unknown Environment
    Chai, Runqi
    Niu, Hanlin
    Carrasco, Joaquin
    Arvin, Farshad
    Yin, Hujun
    Lennox, Barry
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (04) : 5778 - 5792
  • [4] Chen YQ, 2024, Arxiv, DOI [arXiv:2210.08872, DOI 10.48550/ARXIV.2210.08872, 10.48550/arXiv.2210.08872]
  • [5] Foerster JN, 2018, AAAI CONF ARTIF INTE, P2974
  • [6] Fu H., 2023, P 2023 INT C AUT AG, P485
  • [7] Gallici M., 2023, P INT C AUT AG MULT, P1679
  • [8] Gupta T, 2021, PR MACH LEARN RES, V139
  • [9] Hao J., 2022, Boosting multiagent reinforcement learning via permutation invariant and permutation equivariant networks
  • [10] Hinton G, 2015, Arxiv, DOI [arXiv:1503.02531, DOI 10.48550/ARXIV.1503.02531]