共 19 条
[1]
[Anonymous], 2019, GITHUB PYTHON MDPTOO
[3]
Bertsekas D.P., 1996, NEURO DYNAMIC PROGRA, V3
[4]
Borkar V.S., 2008, STOCHASTIC APPROXIMA
[5]
Dai B, 2018, PR MACH LEARN RES, V80
[6]
Devraj A. M., 2017, Advances in Neural Information Processing Systems (NeurIPS), P2235
[7]
Furmston T, 2016, J MACH LEARN RES, V17
[8]
Goyal V, 2021, Arxiv, DOI arXiv:1905.09963
[9]
Haarnoja T, 2017, PR MACH LEARN RES, V70
[10]
Haarnoja T, 2018, PR MACH LEARN RES, V80