共 15 条
[1]
Costa P., 2022, ICLR Blog
[2]
Gebotys B, 2022, Arxiv, DOI [arXiv:2201.09104, 10.48550/arXiv.2201.09104, DOI 10.48550/ARXIV.2201.09104]
[3]
Lu Shuai, arXiv, DOI DOI 10.48550/ARXIV.2011.05525
[4]
Mnih V, 2013, Arxiv, DOI arXiv:1312.5602
[5]
Petrazzini Irving G. B., arXiv, DOI [10.48550/arXiv.2111.02202, DOI 10.48550/ARXIV.2111.02202]
[6]
Schulman J., 2015, arXiv, DOI [arXiv:1502.05477, DOI 10.48550/ARXIV.1502.05477]
[7]
Schulman J, 2017, Arxiv, DOI arXiv:1707.06347
[8]
Suilen M, 2024, Arxiv, DOI [arXiv:2411.11451, 10.48550/arXiv.2411.11451, DOI 10.48550/ARXIV.2411.11451]
[9]
Sutton RS, 2018, ADAPT COMPUT MACH LE, P1
[10]
The MathWorks Inc., Reinforcement Learning Designer