共 48 条
- [41] Haarnoja T., Tang H., Abbeel P., Levine S., Reinforcement learning with deep energy-based policies, Proceedings of the International Conference on Machine Learning, pp. 1352-1361, (2017)
- [42] Szepesvari C., The asymptotic convergence-rate of q-learning, Proceedings of the Advances in Neural Information Processing Systems, pp. 1064-1070, (1998)
- [43] Ma C., Wen J., Bengio Y., (2018)
- [44] Barto A.G., Sutton R.S., Anderson C.W., Neuronlike adaptive elements that can solve difficult learning control problems, IEEE Trans. Syst. Man Cybern., SMC-13, 5, pp. 834-846, (1983)
- [45] Dhariwal P., Hesse C., Klimov O., Nichol A., Plappert M., Radford A., Schulman J., Sidor S., Wu Y., Zhokhov P., (2017)
- [46] Castro P.S., Moitra S., Gelada C., Kumar S., Bellemare M.G., (2018)
- [47] Kornblith S., Shlens J., Le Q.V., Do better imagenet models transfer better?, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2661-2671, (2019)
- [48] Devlin J., Chang M.-W., Lee K., Toutanova K., (2018)