Human-level performance in 3D multiplayer games with population-based reinforcement learning

被引:364
作者
Jaderberg, Max [1 ]
Czarnecki, Wojciech M. [1 ]
Dunning, Iain [1 ]
Marris, Luke [1 ]
Lever, Guy [1 ]
Castaneda, Antonio Garcia [1 ]
Beattie, Charles [1 ]
Rabinowitz, Neil C. [1 ]
Morcos, Ari S. [1 ]
Ruderman, Avraham [1 ]
Sonnerat, Nicolas [1 ]
Green, Tim [1 ]
Deason, Louise [1 ]
Leibo, Joel Z. [1 ]
Silver, David [1 ]
Hassabis, Demis [1 ]
Kavukcuoglu, Koray [1 ]
Graepel, Thore [1 ]
机构
[1] DeepMind, London, England
关键词
TIME; GO;
D O I
10.1126/science.aau6249
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Reinforcement learning (RL) has shown great success in increasingly complex single-agent environments and two-player turn-based games. However, the real world contains multiple agents, each learning and acting independently to cooperate and compete with other agents. We used a tournament-style evaluation to demonstrate that an agent can achieve human-level performance in a three-dimensional multiplayer first-person video game, Quake III Arena in Capture the Flag mode, using only pixels and game points scored as input. We used a two-tier optimization process in which a population of independent RL agents are trained concurrently from thousands of parallel matches on randomly generated environments. Each agent learns its own internal reward signal and rich representation of the world. These results indicate the great potential of multiagent reinforcement learning for artificial intelligence research.
引用
收藏
页码:859 / +
页数:47
相关论文
共 81 条
  • [1] [Anonymous], LECT NOTES ARTIF INT
  • [2] [Anonymous], 10 INT C AUT AG MULT
  • [3] [Anonymous], ARXIVCS9908014CSLG
  • [4] [Anonymous], IEEE S COMP INT GAM
  • [5] [Anonymous], P INT C LEARN REP
  • [6] [Anonymous], 2018, LECT NOTES ARTIFICIA
  • [7] [Anonymous], QUAKECON 2018
  • [8] [Anonymous], P AAAI C ART INT
  • [9] [Anonymous], P 21 NAT C ART INT
  • [10] [Anonymous], P AAAI C ART INT