Policy distillation for efficient decentralized execution in multi-agent reinforcement learning

被引：0

作者：

Pei, Yuhang ^{[1
]}

Ren, Tao ^{[1
]}

Zhang, Yuxiang ^{[1
]}

Sun, Zhipeng ^{[1
]}

Champeyrol, Matys ^{[2
]}

机构：

[1] Northeastern Univ, Software Coll, Shenyang 110169, Liaoning, Peoples R China

[2] CY Tech, 2 Blvd Lucien Favre CS 77563, F-64075 Pau, Pyrenees Atlant, France

来源：

NEUROCOMPUTING | 2025年 / 626卷

关键词：

Multi-agent reinforcement learning; Transformers; Distillation; Computational efficiency;

D O I：

10.1016/j.neucom.2025.129617

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Cooperative Multi-Agent Reinforcement Learning (MARL) addresses complex scenarios where multiple agents collaborate to achieve shared objectives. Training these agents in partially observable environments within the Centralized Training with Decentralized Execution (CTDE) framework remains challenging due to limited information access and the need for lightweight agent networks. To overcome these challenges, we introduce the Centralized Training and Policy Distillation for Decentralized Execution (CTPDE) framework. We propose a centralized dual-attention agent network that integrates global state and local observations to enable lossless value decomposition and prevent homogeneous agent behaviors. In addition, an efficient policy distillation method is proposed, in which a network of action value distribution is distilled from the centralized agent network, ensuring the efficiency of decentralized execution. The evaluation of CTPDE in benchmark environments demonstrates that the attention-based network achieves state-of-the-art performance during training. Moreover, the distilled agent network surpasses existing RNN-based methods and, in some cases, matches the capabilities of more complex architectures. These findings underscore the potential of CTPDE for advancing cooperative MARL tasks.

引用

页数：13

共 44 条

[1] Rusu AA, 2016, Arxiv, DOI arXiv:1511.06295
[2] Difference rewards policy gradients
Castellini, Jacopo
Devlin, Sam
Oliehoek, Frans A.
Savani, Rahul
[J]. NEURAL COMPUTING & APPLICATIONS, 2022,
[3] Design and Experimental Validation of Deep Reinforcement Learning-Based Fast Trajectory Planning and Control for Mobile Robot in Unknown Environment
Chai, Runqi
Niu, Hanlin
Carrasco, Joaquin
Arvin, Farshad
Yin, Hujun
Lennox, Barry
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (04) : 5778 - 5792
[4] Chen YQ, 2024, Arxiv, DOI [arXiv:2210.08872, DOI 10.48550/ARXIV.2210.08872, 10.48550/arXiv.2210.08872]
[5] Foerster JN, 2018, AAAI CONF ARTIF INTE, P2974
[6] Fu H., 2023, P 2023 INT C AUT AG, P485
[7] Gallici M., 2023, P INT C AUT AG MULT, P1679
[8] Gupta T, 2021, PR MACH LEARN RES, V139
[9] Hao J., 2022, Boosting multiagent reinforcement learning via permutation invariant and permutation equivariant networks
[10] Hinton G, 2015, Arxiv, DOI [arXiv:1503.02531, DOI 10.48550/ARXIV.1503.02531]

← 1 2 3 4 5 →