A survey and critique of multiagent deep reinforcement learning

被引:7
作者
Pablo Hernandez-Leal
Bilal Kartal
Matthew E. Taylor
机构
[1] Borealis AI,
来源
Autonomous Agents and Multi-Agent Systems | 2019年 / 33卷
关键词
Multiagent learning; Multiagent systems; Multiagent reinforcement learning; Deep reinforcement learning; Survey;
D O I
暂无
中图分类号
学科分类号
摘要
Deep reinforcement learning (RL) has achieved outstanding results in recent years. This has led to a dramatic increase in the number of applications and methods. Recent works have explored learning beyond single-agent scenarios and have considered multiagent learning (MAL) scenarios. Initial results report successes in complex multiagent domains, although there are several challenges to be addressed. The primary goal of this article is to provide a clear overview of current multiagent deep reinforcement learning (MDRL) literature. Additionally, we complement the overview with a broader analysis: (i) we revisit previous key components, originally presented in MAL and RL, and highlight how they have been adapted to multiagent deep reinforcement learning settings. (ii) We provide general guidelines to new practitioners in the area: describing lessons learned from MDRL works, pointing to recent benchmarks, and outlining open avenues of research. (iii) We take a more critical tone raising practical challenges of MDRL (e.g., implementation and computational demands). We expect this article will help unify and motivate future research to take advantage of the abundant literature that exists (e.g., RL and MAL) in a joint effort to promote fruitful research in the multiagent community.
引用
收藏
页码:750 / 797
页数:47
相关论文
共 332 条
  • [1] Agogino AK(2008)Analyzing and visualizing multiagent rewards in dynamic and stochastic domains Autonomous Agents and Multi-Agent Systems 17 320-338
  • [2] Tumer K(2006)Adaptive importance sampling technique for markov chains using stochastic approximation Operations Research 54 489-504
  • [3] Ahamed TI(2018)Autonomous agents modelling other agents: A comprehensive survey and open problems Artificial Intelligence 258 66-95
  • [4] Borkar VS(2002)Learning in multi-agent systems Knowledge Engineering Review 16 1-8
  • [5] Juneja S(1965)Optimal control of Markov processes with incomplete state information Journal of Mathematical Analysis and Applications 10 174-205
  • [6] Albrecht SV(1981)The evolution of cooperation Science 211 1390-1396
  • [7] Stone P(1995)Residual algorithms: Reinforcement learning with function approximation Machine Learning Proceedings 1995 30-37
  • [8] Alonso E(2004)Solving transition independent decentralized Markov decision processes Journal of Artificial Intelligence Research 22 423-455
  • [9] D’inverno M(2013)The arcade learning environment: An evaluation platform for general agents Journal of Artificial Intelligence Research 47 253-279
  • [10] Kudenko D(1957)A Markovian decision process Journal of Mathematics and Mechanics 6 679-684