A survey and critique of multiagent deep reinforcement learning

被引:385
作者
Hernandez-Leal, Pablo [1 ]
Kartal, Bilal [1 ]
Taylor, Matthew E. [1 ]
机构
[1] Borealis AI, Edmonton, AB, Canada
关键词
Multiagent learning; Multiagent systems; Multiagent reinforcement learning; Deep reinforcement learning; Survey; NEURAL-NETWORKS; COMPREHENSIVE SURVEY; FICTITIOUS PLAY; GAMES; ALGORITHMS; LEARNERS; AGENTS; APPROXIMATION; INTELLIGENCE; COORDINATION;
D O I
10.1007/s10458-019-09421-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep reinforcement learning (RL) has achieved outstanding results in recent years. This has led to a dramatic increase in the number of applications and methods. Recent works have explored learning beyond single-agent scenarios and have considered multiagent learning (MAL) scenarios. Initial results report successes in complex multiagent domains, although there are several challenges to be addressed. The primary goal of this article is to provide a clear overview of current multiagent deep reinforcement learning (MDRL) literature. Additionally, we complement the overview with a broader analysis: (i) we revisit previous key components, originally presented in MAL and RL, and highlight how they have been adapted to multiagent deep reinforcement learning settings. (ii) We provide general guidelines to new practitioners in the area: describing lessons learned from MDRL works, pointing to recent benchmarks, and outlining open avenues of research. (iii) We take a more critical tone raising practical challenges of MDRL (e.g., implementation and computational demands). We expect this article will help unify and motivate future research to take advantage of the abundant literature that exists (e.g., RL and MAL) in a joint effort to promote fruitful research in the multiagent community.
引用
收藏
页码:750 / 797
页数:48
相关论文
共 368 条
[1]  
Agogino A.K., 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems-Volume, P980, DOI DOI 10.1109/AAMAS.2004.10098
[2]   Analyzing and visualizing multiagent rewards in dynamic and stochastic domains [J].
Agogino, Adrian K. ;
Tumer, Kagan .
AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2008, 17 (02) :320-338
[3]   Adaptive importance sampling technique for Markov chains using stochastic approximation [J].
Ahamed, T. P. I. ;
Borkar, V. S. ;
Juneja, S. .
OPERATIONS RESEARCH, 2006, 54 (03) :489-504
[4]   Autonomous agents modelling other agents: A comprehensive survey and open problems [J].
Albrecht, Stefano V. ;
Stone, Peter .
ARTIFICIAL INTELLIGENCE, 2018, 258 :66-95
[5]  
Albrecht Stefano V, 2013, AAMAS
[6]  
Amodei Dario., 2018, AI AND COMPUTE
[7]  
Andre D, 1998, ADV NEUR IN, V10, P1001
[8]  
[Anonymous], CAN WE BE GOOD STEW
[9]  
[Anonymous], AAAI WORKSH INT DEC
[10]  
[Anonymous], ARXIV181202648 CORR