Learning Distinct Strategies for Heterogeneous Cooperative Multi-agent Reinforcement Learning

被引:0
作者
Wan, Kejia [1 ]
Xu, Xinhai [2 ]
Li, Yuan [2 ]
机构
[1] Def Innovat Inst, Beijing, Peoples R China
[2] Acad Mil Sci, Beijing, Peoples R China
来源
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT IV | 2021年 / 12894卷
基金
中国国家自然科学基金;
关键词
Multi-agent reinforcement learning; Heterogeneity; Transfer learning;
D O I
10.1007/978-3-030-86380-7_44
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Value decomposition has been a promising paradigm for cooperative multi-agent reinforcement learning. Many different approaches have been proposed, but few of them consider the heterogeneous settings. Agents with tremendously different behaviours bring great challenges for centralized training with decentralized execution. In this paper, we provide a formulation for the heterogeneous multi-agent reinforcement learning with some theoretical analysis. On top of that, we propose an efficient two-stage heterogeneous learning method. The first stage refers to a transfer technique by tuning existed homogeneous models to heterogeneous ones, which can accelerate the convergent speed. In the second stage, an iterative learning with centralized training is designed to improve the overall performance. We make experiments on heterogeneous unit micromanagement tasks in StarCraft II. The results show that our method could improve the win rate by around 20% for the most difficult scenario, compared with state-of-the-art methods, i.e., QMIX and Weighted QMIX.
引用
收藏
页码:544 / 555
页数:12
相关论文
共 22 条
[1]  
Berner C., 2019, Dota 2 with large scale deep reinforcement learning
[2]   Learning to Collaborate: Multi-Scenario Ranking via Multi-Agent Reinforcement Learning [J].
Feng, Jun ;
Li, Heng ;
Huang, Minlie ;
Liu, Shichen ;
Ou, Wenwu ;
Wang, Zhirong ;
Zhu, Xiaoyan .
WEB CONFERENCE 2018: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW2018), 2018, :1939-1948
[3]  
Foerster JN, 2018, AAAI CONF ARTIF INTE, P2974
[4]  
Garnett R, 2018, ADV NEURAL INFORM PR, V31, P8102
[5]  
Guestrin C, 2002, ADV NEUR IN, V14, P1523
[6]   Non-convex optimization for machine learning [J].
Jain, Prateek ;
Kar, Purushottam .
Foundations and Trends in Machine Learning, 2017, 10 (3-4) :142-336
[7]   The world of independent learners is not markovian [J].
Laurent, Guillaume J. ;
Matignon, Laetitia ;
Le Fort-Piat, N. .
INTERNATIONAL JOURNAL OF KNOWLEDGE-BASED AND INTELLIGENT ENGINEERING SYSTEMS, 2011, 15 (01) :55-64
[8]  
Ma Jinming, 2020, P 19 INT C AUT AG MU, P816
[9]   Human-level control through deep reinforcement learning [J].
Mnih, Volodymyr ;
Kavukcuoglu, Koray ;
Silver, David ;
Rusu, Andrei A. ;
Veness, Joel ;
Bellemare, Marc G. ;
Graves, Alex ;
Riedmiller, Martin ;
Fidjeland, Andreas K. ;
Ostrovski, Georg ;
Petersen, Stig ;
Beattie, Charles ;
Sadik, Amir ;
Antonoglou, Ioannis ;
King, Helen ;
Kumaran, Dharshan ;
Wierstra, Daan ;
Legg, Shane ;
Hassabis, Demis .
NATURE, 2015, 518 (7540) :529-533
[10]   Optimal and approximate Q-value functions for decentralized POMDPs [J].
Oliehoek, Frans A. ;
Spaan, Matthijs T. J. ;
Vlassis, Nikos .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2008, 32 :289-353