共 51 条
- [1] Bharadhwaj H., Xie K., Shkurti F., Model-predictive control via cross-entropy and gradient-based optimization, Proc. Learn. Dyn. Control, pp. 277-286, (2020)
- [2] Botvinick M., Toussaint M., Planning as inference, Trends Cogn. Sci., 16, 10, pp. 485-488, (2012)
- [3] Caterini A.L., Doucet A., Sejdinovic D., Hamiltonian variational auto-encoder, Proc. Adv. Neural Inf. Process. Syst., pp. 8167-8177, (2018)
- [4] Chow Y., Nachum O., Duenez-Guzman E., Ghavamzadeh M., A lyapunov-based approach to safe reinforcement learning, Adv. Neural Inf. Process. Syst., 31, (2018)
- [5] Chow Y., Nachum O., Faust A., Duenez-Guzman E., Ghavamzadeh M., Lyapunov-based Safe Policy Optimization for Continuous Control, (2019)
- [6] Chua K., Calandra R., McAllister R., Levine S., Deep reinforcement learning in a handful of trials using probabilistic dynamics models, Proc. 32nd Int. Conf. Neural Inf. Process. Syst., pp. 4759-4770, (2018)
- [7] Ciosek K., Vuong Q., Loftin R., Hofmann K., Better exploration with optimistic actor critic, Proc. Adv. Neural Inf. Process. Syst., 32, pp. 1787-1798, (2019)
- [8] Coumans E., Bai Y., PyBullet, a Python module for physics simulation for games, robotics and machine learning, (2016)
- [9] Cremer C., Li X., Duvenaud D., Inference suboptimality in variational autoencoders, Proc. Int. Conf. Mach. Learn., pp. 1078-1086, (2018)
- [10] Dayan P., Hinton G.E., Using expectation-maximization for reinforcement learning, Neural Comput., 9, 2, pp. 271-278, (1997)