Independent Deep Deterministic Policy Gradient Reinforcement Learning in Cooperative Multiagent Pursuit Games

被引：5

作者：

Zhou, Shiyang ^{[1
,2
]}

Ren, Weiya ^{[1
,2
]}

Ren, Xiaoguang ^{[1
,2
]}

Wang, Yanzhen ^{[1
,2
]}

Yi, Xiaodong ^{[1
,2
]}

机构：

[1] Def Innovat Inst, Artificial Intelligence Res Ctr, Beijing 100072, Peoples R China

[2] Tianjin Artificial Intelligence Innovat Ctr, Tianjin 300457, Peoples R China

来源：

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT IV | 2021年 / 12894卷

关键词：

Reinforcement learning; Actor-critic; Potential field; Planning and learning; Predator-prey;

D O I：

10.1007/978-3-030-86380-7_51

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we study a fully decentralized multi-agent pursuit problem in a non-communication environment. Fully decentralized (decentralized training and decentralized execution) has stronger robustness and scalability compared with centralized training and decentralized execution (CTDE), which is the current popular multi-agent reinforcement learning method. Both centralized training and communication mechanism require a large amount of information exchange between agents, which are strong assumptions that are difficult to meet in reality. However, traditional fully decentralized multi-agent reinforcement learning methods (e.g., IQL) are difficult to converge stably due to the dynamic changes of other agents' strategies. Therefore, we extend actor-critic to actor-critic-N framework, and propose Potential-Field-Guided Deep Deterministic Policy Gradient (PGDDPG) method on this basis. The agent uses the unified artificial potential field to guide the agent's strategy updating, which reduces the uncertainty of multi-agent's decision making in the complex and dynamic changing environment. Thus, PGDDPG which we proposed can converge fast and stably. Finally, through the pursuit experiments in MPE and CARLA, we prove that our method achieves higher success rate and more stable performance than DDPG and MADDPG.

引用

页码：625 / 637

页数：13

共 19 条

[1]

[Anonymous], 2006, Planning Algorithms

[2] Xception: Deep Learning with Depthwise Separable Convolutions [J].

Chollet, Francois .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1800-1807

[3]

Dosovitskiy A, 2017, PR MACH LEARN RES, V78

[4]

Hao XT, 2019, AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, P1315

[5]

Harutyunyan A, 2015, AAAI CONF ARTIF INTE, P2652

[6] A survey and critique of multiagent deep reinforcement learning [J].

Hernandez-Leal, Pablo ;

Kartal, Bilal ;

Taylor, Matthew E. .

AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2019, 33 (06) :750-797

[7]

Jaques N, 2019, PR MACH LEARN RES, V97

[8]

Jiang JC, 2018, ADV NEUR IN, V31

[9]

Lillicrap T. P., 2015, ARXIV

[10]

Lowe R, 2017, ADV NEUR IN, V30

← 1 2 →