共 99 条
[1]
Antos A(2008)Fitted q-iteration in continuous action-space mdps Advances in Neural Information Processing Systems 20 9-16
[2]
Szepesvári C(2005)Dynamic programming and optimal control European Journal of Control 11 4-5
[3]
Munos R(2008)Incremental natural actor-critic algorithms Advances in Neural Information Processing Systems 20 105-112
[4]
Bertsekas DP(2009)Natural actor-critic algorithms Automatica 45 2471-2482
[5]
Bhatnagar S(1997)Stochastic approximation with two time scales Systems & Control Letters 29 291-294
[6]
Ghavamzadeh M(2000)The ode method for convergence of stochastic approximation and reinforcement learning SIAM Journal on Control and Optimization 38 447-469
[7]
Lee M(1998)Online learning and stochastic approximations On-line Learning in Neural Networks 17 142-526
[8]
Sutton RS(2002)Stability and generalization Journal of Machine Learning Research 2 499-376
[9]
Bhatnagar S(1995)Generalization in reinforcement learning: Safely approximating the value function Advances in Neural Information Processing Systems 7 369-5
[10]
Sutton R(2019)Neural temporal-difference learning converges to global optima Advances in Neural Information Processing Systems 32 4-410