共 50 条
- [2] Bello I., 2017, Neural combinatorial optimization with reinforcement learning, P1
- [3] Chorowski J, 2014, End-to-End Continuous Speech Recognition Using Attention-Based Recurrent NN: First Results, P1
- [6] A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (06): : 1291 - 1307
- [7] Guo M., 2020, Conference on Robot Learning, P283
- [8] A new Q-learning algorithm based on the Metropolis criterion [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2004, 34 (05): : 2140 - 2143
- [10] Joshi ChaitanyaK, 2019, NEURIPS 2019 GRAPH R