共 95 条
[1]
Watkins C J C H., Learning from delayed rewards, (1989)
[2]
Mnih V, Kavukcuoglu K, Silver D, Et al., Playing atari with deep reinforcement learning, Proceedings of the Workshops at the 26th Neural Information Processing Systems (NIPS), pp. 201-220, (2013)
[3]
Mnih V, Kavukcuoglu K, Silver D, Et al., Human-level control through deep reinforcement learning, Nature, 518, 7540, pp. 529-533, (2015)
[4]
Liu Q, Zhai J, Zhang Z, Et al., A survey on deep reinforcement learning, Chinese Journal of Computers, 41, 1, pp. 1-27, (2018)
[5]
Silver D, Huang A, Maddison C J, Et al., Mastering the game of go with deep neural networks and tree search, Nature, 529, 7587, pp. 484-489, (2016)
[6]
Silver D, Schrittwieser J, Simonyan K, Et al., Mastering the game of go without human knowledge, Nature, 550, 7676, pp. 354-359, (2017)
[7]
Vinyals O, Babuschkin I, Czarnecki W M, Et al., Grandmaster level in StarCraft II using multi-agent reinforcement learning, Nature, 575, 7782, pp. 350-354, (2019)
[8]
Polydoros A S, Nalpantidis L., Survey of model-based reinforcement learning: Applications on robotics, Journal of Intelligent & Robotic Systems, 86, 2, pp. 153-173, (2017)
[9]
Yu L, Zhang W, Wang J, Et al., Seqgan: Sequence generative adversarial nets with policy gradient, Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI), pp. 2852-2858, (2017)
[10]
Ng A Y, Coates A, Diel M, Et al., Autonomous Inverted Helicopter Flight Via Reinforcement Learning, Experimental Robotics IX, pp. 363-372, (2006)