Generating individual intrinsic reward for cooperative multiagent reinforcement learning

被引:6
作者
Wu, Haolin [1 ]
Li, Hui [2 ]
Zhang, Jianwei [2 ]
Wang, Zhuang [1 ]
Zhang, Jianeng [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu, Peoples R China
[2] Sichuan Univ, Natl Key Lab Fundamental Sci Synthet Vis, Chengdu, Peoples R China
来源
INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS | 2021年 / 18卷 / 05期
关键词
Cooperative multiagent reinforcement learning; lazy agent problem; global reward; intrinsic reward; generation; FRAMEWORK;
D O I
10.1177/17298814211044946
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Multiagent reinforcement learning holds considerable promise to deal with cooperative multiagent tasks. Unfortunately, the only global reward shared by all agents in the cooperative tasks may lead to the lazy agent problem. To cope with such a problem, we propose a generating individual intrinsic reward algorithm, which introduces an intrinsic reward encoder to generate an individual intrinsic reward for each agent and utilizes the hypernetworks as the decoder to help to estimate the individual action values of the decomposition methods based on the generated individual intrinsic reward. Experimental results in the StarCraft II micromanagement benchmark prove that the proposed algorithm can increase learning efficiency and improve policy performance.
引用
收藏
页数:8
相关论文
共 32 条
[1]  
Abbeel P., 2015, ASS P 17 INT ACM, P1889, DOI DOI 10.1145/2700648.2809870
[2]  
Bahdanau D., 2018, 7 INT C LEARN REPR I
[3]  
Bellemare MG, 2016, ADV NEUR IN, V29
[4]   An Overview of Recent Progress in the Study of Distributed Multi-Agent Coordination [J].
Cao, Yongcan ;
Yu, Wenwu ;
Ren, Wei ;
Chen, Guanrong .
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2013, 9 (01) :427-438
[5]  
Cho K., 2014, C EMP METH NAT LANG, P1724, DOI [10.3115/v1/D14-1179, DOI 10.3115/V1/D14-1179]
[6]  
Claus C, 1998, FIFTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-98) AND TENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICAL INTELLIGENCE (IAAI-98) - PROCEEDINGS, P746
[7]  
Devlin S, 2014, AAMAS'14: PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, P165
[8]  
Du YL, 2019, ADV NEUR IN, V32
[9]   Potential-based reward shaping for finite horizon online POMDP planning [J].
Eck, Adam ;
Soh, Leen-Kiat ;
Devlin, Sam ;
Kudenko, Daniel .
AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2016, 30 (03) :403-445
[10]  
Foerster JN, 2018, AAAI CONF ARTIF INTE, P2974