Selective policy transfer in multi-agent systems with sparse interactions

被引：0

作者：

Zhuang, Yunkai ^{[1
]}

Liu, Yong ^{[2
]}

Yang, Shangdong ^{[1
,3
]}

Gao, Yang ^{[1
]}

机构：

[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210023, Peoples R China

[2] Natl Univ Singapore, Sch Comp, Singapore 119077, Singapore

[3] Nanjing Univ Posts & Telecommun, Sch Comp Sci, Nanjing 210023, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 300卷

基金：

中国国家自然科学基金;

关键词：

Policy transfer; Option; Sparse interaction; Multi-agent reinforcement learning; REINFORCEMENT; MDPS;

D O I：

10.1016/j.knosys.2024.112031

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Previously trained single-agent strategies, which are considerably easier to acquire than multi-agent strategies, are instructive for multi-agent reinforcement learning, especially when the interactions between agents are sparse. Traditional methods of knowledge transfer, from single-agent source tasks to multi-agent target tasks, typically rely on a pre-designed Markov-decision-process similarity function or metric function for evaluating the task similarity. In this study, we propose a selective policy transfer (SPOT) algorithm that eliminates the need for the manual crafting of a metric function to assess the similarity between source and target tasks. The SPOT algorithm enables agents to autonomously determine when and which policy to transfer using well- trained single-agent policies as options in the training process. We introduced a multi-agent policy-learning option in the option library, thus allowing the SPOT algorithm to leverage the transferred knowledge while concurrently learning new policies. The SPOT algorithm efficiently transfers sequential strategies in a few steps, thereby capturing high-level semantics. Experimental results obtained in both multi-agent arcade and multi-agent particle environments have demonstrated that the proposed algorithm outperforms the state-ofthe-art methods in terms of jump-start and convergence speeds. Our evaluation indicators included the jump start, steps to threshold, cumulative reward, and asymptotic performance. Furthermore, visualizations of the strategies and termination functions of the agents aid in elucidating the operational principles of the SPOT algorithm.

引用

页数：13

共 53 条

[1] Agarwal A, 2019, Arxiv, DOI arXiv:1906.01202
[2] Autonomous agents modelling other agents: A comprehensive survey and open problems
Albrecht, Stefano V.
Stone, Peter
[J]. ARTIFICIAL INTELLIGENCE, 2018, 258 : 66 - 95
[3] Towards a more efficient computation of individual attribute and policy contribution for post-hoc explanation of cooperative multi-agent systems using Myerson values
Angelotti, Giorgio
Diaz-Rodriguez, Natalia
[J]. KNOWLEDGE-BASED SYSTEMS, 2023, 260
[4] Banerjee B, 2007, 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P672
[5] Barrett S, 2015, AAAI CONF ARTIF INTE, P2010
[6] Superhuman AI for multiplayer poker
Brown, Noam
Sandholm, Tuomas
[J]. SCIENCE, 2019, 365 (6456) : 885 - +
[7] Brown N, 2017, PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P5226
[8] A game-based deep reinforcement learning approach for energy-efficient computation in MEC systems
Chen, Miaojiang
Liu, Wei
Wang, Tian
Zhang, Shaobo
Liu, Anfeng
[J]. KNOWLEDGE-BASED SYSTEMS, 2022, 235
[9] Deep reinforcement learning in recommender systems: A survey and new perspectives
Chen, Xiaocong
Yao, Lina
McAuley, Julian
Zhou, Guanglin
Wang, Xianzhi
[J]. KNOWLEDGE-BASED SYSTEMS, 2023, 264
[10] Agents teaching agents: a survey on inter-agent transfer learning
Da Silva, Felipe Leno
Warnell, Garrett
Costa, Anna Helena Reali
Stone, Peter
[J]. AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2020, 34 (01)

← 1 2 3 4 5 6 →