共 40 条
[1]
Abachi R, 2021, Arxiv, DOI arXiv:2003.00030
[2]
[Anonymous], 1996, Neuro-dynamic Programming
[3]
Beck J, 2024, Arxiv, DOI [arXiv:2301.08028, 10.48550/ARXIV.2301.08028]
[5]
Beukman Michael, 2024, Advances in Neural Information Processing Systems, V36
[7]
Dalal Murtaza, 2018, ICLR
[8]
Duan Y, 2016, Arxiv, DOI [arXiv:1611.02779, 10.48550/arXiv.1611.02779]
[9]
Duff Michael OGordon, 2002, Optimal Learning: Computational procedures for Bayes-adaptive Markov decision processes
[10]
Perez CF, 2018, Arxiv, DOI arXiv:1812.03399