Cooperative Multi-Agent Deep Reinforcement Learning with Counterfactual Reward

被引：2

作者：

Shao, Kun ^{[1
,2
]}

Zhu, Yuanheng ^{[1
]}

Tang, Zhentao ^{[1
,2
]}

Zhao, Dongbin ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China

来源：

2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2020年

关键词：

reinforcement learning; deep reinforcement learning; cooperative games; counterfactual reward; LEVEL; GAME; GO;

D O I：

10.1109/ijcnn48605.2020.9207169

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In partially observable fully cooperative games, agents generally tend to maximize global rewards with joint actions, so it is difficult for each agent to deduce their own contribution. To address this credit assignment problem, we propose a multi-agent reinforcement learning algorithm with counterfactual reward mechanism, which is termed as CoRe algorithm. CoRe computes the global reward difference in condition that the agent does not take its actual action but takes other actions, while other agents fix their actual actions. This approach can determine each agent's contribution for the global reward. We evaluate CoRe in a simplified Pig Chase game with a decentralised Deep Q Network (DQN) framework. The proposed method helps agents learn end-to-end collaborative behaviors. Compared with other DQN variants with global reward, CoRe significantly improves learning efficiency and achieves better results. In addition, CoRe shows excellent performances in various size game environments.

引用

页数：8

共 35 条

[1]

Agogino A. K., 2006, P 21 NAT C ART INT 1

[2]

[Anonymous], 2016, ICLR

[3]

[Anonymous], 2016, A concise introduction to decentralized POMDPs

[4]

Bellemare MG, 2017, PR MACH LEARN RES, V70

[5]

Berner C., 2019, arXiv preprint arXiv:1912.06680

[6]

Colby M., 2015, AAMAS

[7]

Dabney W, 2018, AAAI CONF ARTIF INTE, P2892

[8]

Foerster JN, 2016, ADV NEUR IN, V29

[9]

Foerster JN, 2018, AAAI CONF ARTIF INTE, P2974

[10]

Gupta Jayesh K., 2017, Autonomous Agents and Multiagent Systems, AAMAS 2017: Workshops, Best Papers. Revised Selected Papers: LNAI 10642, P66, DOI 10.1007/978-3-319-71682-4_5

← 1 2 3 4 →