共 54 条
- [1] Mnih V(2015)Human-level control through deep reinforcement learning. nature 518 529-533
- [2] Silver D(2016)Mastering the game of go with deep neural networks and tree search nature 529 484-489
- [3] Levine S(2016)End-to-end training of deep visuomotor policies The J. Mach. Learn. Res. 17 1334-1373
- [4] Finn C(2005)Cooperative multi-agent learning: The state of the art Auton. Agent Multi-Agent Syst. 11 387-434
- [5] Darrell T(2019)A survey and critique of multiagent deep reinforcement learning Auton. Agent Multi-Agent Syst. 33 750-797
- [6] Abbeel P(2003)Nash q-learning for general-sum stochastic games J. Mach. Learn. Res. 4 1039-1069
- [7] Panait L(2001)Awesome: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents Friend-or-foe q-learning in general-sum games 1 322-328
- [8] Luke S(2003)Lenient learning in independent-learner stochastic cooperative games Correlated q-learning 3 242-249
- [9] Hernandez-Leal P(2007)Value-function reinforcement learning in markov games Mach. Learn. 67 23-43
- [10] Kartal B(2016)Near-optimal reinforcement learning with self-play The J. Mach. Learn. Res. 17 2914-2955