CuMARL: Curiosity-Based Learning in Multiagent Reinforcement Learning

被引:5
作者
Ningombam, Devarani Devi [1 ,2 ]
Yoo, Byunghyun [2 ,3 ]
Kim, Hyun Woo [2 ,3 ]
Song, Hwa Jeon [2 ,3 ,4 ]
Yi, Sungwon [2 ,3 ,5 ]
机构
[1] Univ Petr & Energy Studies UPES, Dept Informat, Sch Comp Sci, Uttarakhand 248007, India
[2] Elect & Telecommunicat Res Inst ETRI, Daejeon 34129, South Korea
[3] Elect & Telecommunicat Res Inst ETRI, Daejeon, South Korea
[4] Elect & Telecommunicat Res Inst, ETRI, Daejeon, South Korea
[5] Elect & Telecommunicat Res Inst, ETRI, Daejeon, South Korea
关键词
Training data; Reinforcement learning; Mutual information; Games; Decision making; Behavioral sciences; Multi-agent systems; Multi-agent reinforcement learning; curiosity; conditional mutual information; prioritized experience replay;
D O I
10.1109/ACCESS.2022.3198981
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a novel curiosity-based learning algorithm for Multi-agent Reinforcement Learning (MARL) to attain efficient and effective decision-making. We employ the centralized training with decentralized execution framework (CTDE) and consider that each agent has knowledge of the prior action distribution of others. To quantify the difference in agents' knowledge, curiosity, we introduce conditional mutual information (CMI) regularization and use the amount of information for updating decision-making policy. Then, to deploy these learning frameworks in a large-scale MARL setting while acquiring high sample efficiency, we consider a Kullback-Leibler (KL) divergence-based prioritization of experiences. We evaluate the effectiveness of the proposed algorithm in three different levels of StarCraft Multi Agent Challenge (SMAC) scenarios using the PyMARL framework. The simulation-based performance analysis shows that the proposed technique significantly improves the test win rate compared to various state-of-the-art MARL benchmarks, such as the Optimistically Weighted Monotonic Value Function Factorization (OW_QMIX) and Learning Individual Intrinsic Reward (LIIR).
引用
收藏
页码:87254 / 87265
页数:12
相关论文
共 29 条
[1]   WHY SIBLINGS ARE IMPORTANT AGENTS OF COGNITIVE-DEVELOPMENT - A COMPARISON OF SIBLINGS AND PEERS [J].
AZMITIA, M ;
HESSER, J .
CHILD DEVELOPMENT, 1993, 64 (02) :430-444
[2]  
Bellemare MG, 2016, ADV NEUR IN, V29
[3]   Skill-based curiosity for intrinsically motivated reinforcement learning [J].
Bougie, Nicolas ;
Ichise, Ryutaro .
MACHINE LEARNING, 2020, 109 (03) :493-512
[4]   Sibling relationship quality: Its causes and consequences [J].
Brody, GH .
ANNUAL REVIEW OF PSYCHOLOGY, 1998, 49 :1-24
[5]  
Dimakopoulou M., 2018, PROC 35 INT C MACH L
[6]  
Dimakopoulou M., 2018, PROC 35 INT C MACH L, P80
[7]  
Du YL, 2019, ADV NEUR IN, V32
[8]  
Foerster JN, 2017, PR MACH LEARN RES, V70
[9]   Intrinsically motivated model learning for developing curious robots [J].
Hester, Todd ;
Stone, Peter .
ARTIFICIAL INTELLIGENCE, 2017, 247 :170-186
[10]  
Hughes E, 2018, ADV NEUR IN, V31