Independent Deep Deterministic Policy Gradient Reinforcement Learning in Cooperative Multiagent Pursuit Games

被引:2
作者
Zhou, Shiyang [1 ,2 ]
Ren, Weiya [1 ,2 ]
Ren, Xiaoguang [1 ,2 ]
Wang, Yanzhen [1 ,2 ]
Yi, Xiaodong [1 ,2 ]
机构
[1] Def Innovat Inst, Artificial Intelligence Res Ctr, Beijing 100072, Peoples R China
[2] Tianjin Artificial Intelligence Innovat Ctr, Tianjin 300457, Peoples R China
来源
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT IV | 2021年 / 12894卷
关键词
Reinforcement learning; Actor-critic; Potential field; Planning and learning; Predator-prey;
D O I
10.1007/978-3-030-86380-7_51
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we study a fully decentralized multi-agent pursuit problem in a non-communication environment. Fully decentralized (decentralized training and decentralized execution) has stronger robustness and scalability compared with centralized training and decentralized execution (CTDE), which is the current popular multi-agent reinforcement learning method. Both centralized training and communication mechanism require a large amount of information exchange between agents, which are strong assumptions that are difficult to meet in reality. However, traditional fully decentralized multi-agent reinforcement learning methods (e.g., IQL) are difficult to converge stably due to the dynamic changes of other agents' strategies. Therefore, we extend actor-critic to actor-critic-N framework, and propose Potential-Field-Guided Deep Deterministic Policy Gradient (PGDDPG) method on this basis. The agent uses the unified artificial potential field to guide the agent's strategy updating, which reduces the uncertainty of multi-agent's decision making in the complex and dynamic changing environment. Thus, PGDDPG which we proposed can converge fast and stably. Finally, through the pursuit experiments in MPE and CARLA, we prove that our method achieves higher success rate and more stable performance than DDPG and MADDPG.
引用
收藏
页码:625 / 637
页数:13
相关论文
共 19 条
  • [1] [Anonymous], 2006, Planning algorithms, Complexity
  • [2] [Anonymous], arXiv
  • [3] Xception: Deep Learning with Depthwise Separable Convolutions
    Chollet, Francois
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1800 - 1807
  • [4] Dosovitskiy A, 2017, P 1 ANN C ROB LEARN, P1, DOI DOI 10.48550/ARXIV.1711.03938
  • [5] Hao XT, 2019, AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, P1315
  • [6] Harutyunyan A, 2015, AAAI CONF ARTIF INTE, P2652
  • [7] A survey and critique of multiagent deep reinforcement learning
    Hernandez-Leal, Pablo
    Kartal, Bilal
    Taylor, Matthew E.
    [J]. AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2019, 33 (06) : 750 - 797
  • [8] Jaques N, 2019, PR MACH LEARN RES, V97
  • [9] Jiang JC, 2018, ADV NEUR IN, V31
  • [10] Lillicrap T. P., 2016, P INT C LEARN REPR S, P1