CuMARL: Curiosity-Based Learning in Multiagent Reinforcement Learning

被引：5

作者：

Ningombam, Devarani Devi ^{[1
,2
]}

Yoo, Byunghyun ^{[2
,3
]}

Kim, Hyun Woo ^{[2
,3
]}

Song, Hwa Jeon ^{[2
,3
,4
]}

Yi, Sungwon ^{[2
,3
,5
]}

机构：

[1] Univ Petr & Energy Studies UPES, Dept Informat, Sch Comp Sci, Uttarakhand 248007, India

[2] Elect & Telecommunicat Res Inst ETRI, Daejeon 34129, South Korea

[3] Elect & Telecommunicat Res Inst ETRI, Daejeon, South Korea

[4] Elect & Telecommunicat Res Inst, ETRI, Daejeon, South Korea

[5] Elect & Telecommunicat Res Inst, ETRI, Daejeon, South Korea

来源：

IEEE ACCESS | 2022年 / 10卷

关键词：

Training data; Reinforcement learning; Mutual information; Games; Decision making; Behavioral sciences; Multi-agent systems; Multi-agent reinforcement learning; curiosity; conditional mutual information; prioritized experience replay;

D O I：

10.1109/ACCESS.2022.3198981

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we propose a novel curiosity-based learning algorithm for Multi-agent Reinforcement Learning (MARL) to attain efficient and effective decision-making. We employ the centralized training with decentralized execution framework (CTDE) and consider that each agent has knowledge of the prior action distribution of others. To quantify the difference in agents' knowledge, curiosity, we introduce conditional mutual information (CMI) regularization and use the amount of information for updating decision-making policy. Then, to deploy these learning frameworks in a large-scale MARL setting while acquiring high sample efficiency, we consider a Kullback-Leibler (KL) divergence-based prioritization of experiences. We evaluate the effectiveness of the proposed algorithm in three different levels of StarCraft Multi Agent Challenge (SMAC) scenarios using the PyMARL framework. The simulation-based performance analysis shows that the proposed technique significantly improves the test win rate compared to various state-of-the-art MARL benchmarks, such as the Optimistically Weighted Monotonic Value Function Factorization (OW_QMIX) and Learning Individual Intrinsic Reward (LIIR).

引用

页码：87254 / 87265

页数：12

共 29 条

[1] WHY SIBLINGS ARE IMPORTANT AGENTS OF COGNITIVE-DEVELOPMENT - A COMPARISON OF SIBLINGS AND PEERS [J].