Sequence to Sequence Multi-agent Reinforcement Learning Algorithm

被引:0
|
作者
Shi T. [1 ]
Wang L. [1 ]
Huang Z. [1 ]
机构
[1] College of Data Science, Taiyuan University of Technology, Jinzhong
来源
Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence | 2021年 / 34卷 / 03期
基金
中国国家自然科学基金;
关键词
Block Structure; Deep Deterministic Policy Gradient(DDPG); Multi-agent Reinforcement Learning; Sequence to Sequence(Seq2Seq);
D O I
10.16451/j.cnki.issn1003-6059.202103002
中图分类号
学科分类号
摘要
The multi-agent reinforcement learning algorithm is difficult to adapt to dynamically changing environments of agent scale. Aiming at this problem, a sequence to sequence multi-agent reinforcement learning algorithm(SMARL) based on sequential learning and block structure is proposed. The control network of an agent is divided into action network and target network based on deep deterministic policy gradient structure and sequence-to-sequence structure, respectively, and the correlation between algorithm structure and agent scale is removed. Inputs and outputs of the algorithm are also processed to break the correlation between algorithm policy and agent scale. Agents in SMARL can quickly adapt to the new environment, take different roles in task and achieve fast learning. Experiments show that the adaptability, performance and training efficiency of the proposed algorithm are superior to baseline algorithms. © 2021, Science Press. All right reserved.
引用
收藏
页码:206 / 213
页数:7
相关论文
共 17 条
  • [1] SHOHAM Y, POWERS R, GRENAGER T., Multi-agent Reinforcement Learning: A Critical Survey
  • [2] VINYALS O, BABUSCHKIN I, CZARNECKI W M, Et al., Grandmaster Level in StarCraft II Using Multi-agent Reinforcement Lear-ning, Nature, 575, 7782, pp. 350-354, (2019)
  • [3] MOHSENI-KABIR A, ISELE D, FUJIMURA K., Interaction-Aware Multi-agent Reinforcement Learning for Mobile Agents with Indivi-dual Goals, Proc of the International Conference on Robotics and Automation, pp. 3370-3376, (2019)
  • [4] ZHANG H C, FENG S Y, LIU C, Et al., Cityflow: A Multi-agent Reinforcement Learning Environment for Large Scale City Traffic Scenario, Proc of the World Wide Web Conference, pp. 3620-3624, (2019)
  • [5] LOWE R, WU Y, TAMAR A, Et al., Multi-agent Actor-Critic for Mixed Cooperative-Competitive Environments, Proc of the 31st International Conference on Neural Information Processing Systems, pp. 6382-6393, (2017)
  • [6] FOERSTER J N, FARQUHAR G, AFOURAS T, Et al., Counterfactual Multi-agent Policy Gradients
  • [7] WEI E, WICKE D, FREELAN D, Et al., Multiagent Soft Q-Learning
  • [8] BRYS T, HARUTYUNYAN A, TAYLOR M E, Et al., Policy Transfer Using Reward Shaping, Proc of the International Conference on Autonomous Agents and Multiagent Systems, pp. 181-188, (2015)
  • [9] TAYLOR A, DUPARIC I, GALVAN-LOPEZ E, Et al., Transfer Learning in Multi-agent Systems through Parallel Transfer
  • [10] MNIH V, BADIA A P, MIRZA M, Et al., Asynchronous Methods for Deep Reinforcement Learning, Proc of the 33rd International Conference on Machine Learning, pp. 1928-1937, (2016)