共 28 条
[11]
Rothblum U. G.(1994)On the convergence of stochastic iterative dynamic programming algorithms Neural Computation 7 345-352
[12]
Choi D. S.(1995)Reinforcement learning algorithm for partially observable Markov decision problems Advances in Neural Information Processing Systems 16 185-202
[13]
Van Roy B.(1994)Asynchronous stochastic approximation and Q-learning Machine Learning 22 59-94
[14]
Eaton J. H.(1996)Feature-based methods for large-scale dynamic programming Machine Learning 44 1840-1851
[15]
Zadeh L. A.(1999)Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing financial derivatives IEEE Transactions on Automatic Control 40 1635-1660
[16]
Feinberg E. A.(1969)Discrete dynamic programming with sensitive discount optimality criteria Annals of Mathematical Statistics undefined undefined-undefined
[17]
Jaakkola T. S.(undefined)undefined undefined undefined undefined-undefined
[18]
Jordan M. I.(undefined)undefined undefined undefined undefined-undefined
[19]
Singh S. P.(undefined)undefined undefined undefined undefined-undefined
[20]
Jaakkola T. S.(undefined)undefined undefined undefined undefined-undefined