Selective policy transfer in multi-agent systems with sparse interactions

被引:0
作者
Zhuang, Yunkai [1 ]
Liu, Yong [2 ]
Yang, Shangdong [1 ,3 ]
Gao, Yang [1 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210023, Peoples R China
[2] Natl Univ Singapore, Sch Comp, Singapore 119077, Singapore
[3] Nanjing Univ Posts & Telecommun, Sch Comp Sci, Nanjing 210023, Peoples R China
基金
中国国家自然科学基金;
关键词
Policy transfer; Option; Sparse interaction; Multi-agent reinforcement learning; REINFORCEMENT; MDPS;
D O I
10.1016/j.knosys.2024.112031
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Previously trained single-agent strategies, which are considerably easier to acquire than multi-agent strategies, are instructive for multi-agent reinforcement learning, especially when the interactions between agents are sparse. Traditional methods of knowledge transfer, from single-agent source tasks to multi-agent target tasks, typically rely on a pre-designed Markov-decision-process similarity function or metric function for evaluating the task similarity. In this study, we propose a selective policy transfer (SPOT) algorithm that eliminates the need for the manual crafting of a metric function to assess the similarity between source and target tasks. The SPOT algorithm enables agents to autonomously determine when and which policy to transfer using well- trained single-agent policies as options in the training process. We introduced a multi-agent policy-learning option in the option library, thus allowing the SPOT algorithm to leverage the transferred knowledge while concurrently learning new policies. The SPOT algorithm efficiently transfers sequential strategies in a few steps, thereby capturing high-level semantics. Experimental results obtained in both multi-agent arcade and multi-agent particle environments have demonstrated that the proposed algorithm outperforms the state-ofthe-art methods in terms of jump-start and convergence speeds. Our evaluation indicators included the jump start, steps to threshold, cumulative reward, and asymptotic performance. Furthermore, visualizations of the strategies and termination functions of the agents aid in elucidating the operational principles of the SPOT algorithm.
引用
收藏
页数:13
相关论文
共 53 条
  • [1] Agarwal A, 2019, Arxiv, DOI arXiv:1906.01202
  • [2] Autonomous agents modelling other agents: A comprehensive survey and open problems
    Albrecht, Stefano V.
    Stone, Peter
    [J]. ARTIFICIAL INTELLIGENCE, 2018, 258 : 66 - 95
  • [3] Towards a more efficient computation of individual attribute and policy contribution for post-hoc explanation of cooperative multi-agent systems using Myerson values
    Angelotti, Giorgio
    Diaz-Rodriguez, Natalia
    [J]. KNOWLEDGE-BASED SYSTEMS, 2023, 260
  • [4] Banerjee B, 2007, 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P672
  • [5] Barrett S, 2015, AAAI CONF ARTIF INTE, P2010
  • [6] Superhuman AI for multiplayer poker
    Brown, Noam
    Sandholm, Tuomas
    [J]. SCIENCE, 2019, 365 (6456) : 885 - +
  • [7] Brown N, 2017, PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P5226
  • [8] A game-based deep reinforcement learning approach for energy-efficient computation in MEC systems
    Chen, Miaojiang
    Liu, Wei
    Wang, Tian
    Zhang, Shaobo
    Liu, Anfeng
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 235
  • [9] Deep reinforcement learning in recommender systems: A survey and new perspectives
    Chen, Xiaocong
    Yao, Lina
    McAuley, Julian
    Zhou, Guanglin
    Wang, Xianzhi
    [J]. KNOWLEDGE-BASED SYSTEMS, 2023, 264
  • [10] Agents teaching agents: a survey on inter-agent transfer learning
    Da Silva, Felipe Leno
    Warnell, Garrett
    Costa, Anna Helena Reali
    Stone, Peter
    [J]. AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2020, 34 (01)