共 30 条
[1]
Achiam J, 2017, PR MACH LEARN RES, V70
[2]
[Anonymous], 1960, Finite Markov Chains
[3]
[Anonymous], 2007, Stochastic Learning and Optimization: A Sensitivity-Based Approach
[4]
Bertsekas D., 1995, Dynamic Programming and Optimal Control, VII
[5]
Bertsekas DP, 1995, PROCEEDINGS OF THE 34TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-4, P560, DOI 10.1109/CDC.1995.478953
[6]
Brockman Greg, 2016, ARXIV160601540
[7]
Cui D. M., 2016, NEWZOO, V40, P8225
[8]
Dewanto Vektor, 2020, ARXIV201008920
[9]
Duan Y., 2016, PR MACH LEARN RES, P1329, DOI [DOI 10.1109/CVPR.2014.180, 10.5555/3045390.3045531]
[10]
Howard Ronald A., 1960, MATH GAZ, V3, P120