共 87 条
[41]
Lewis AS(2010)Convergence results for some temporal difference methods based on least squares Math. Oper. Res. 35 306-329
[42]
Lions PL(2013)Error bounds for approximations from projected linear equations Ann. Oper. Res. 208 95-132
[43]
Mercier B(2012)Q-learning and policy iteration algorithms for stochastic shortest path problems SIAM J. Control Optim. 50 3310-3343
[44]
Mnih V(undefined)Least squares temporal difference methods: an analysis under general conditions undefined undefined undefined-undefined
[45]
Kavukcuoglu K(undefined)undefined undefined undefined undefined-undefined
[46]
Silver D(undefined)undefined undefined undefined undefined-undefined
[47]
Martinet B(undefined)undefined undefined undefined undefined-undefined
[48]
Nedić A(undefined)undefined undefined undefined undefined-undefined
[49]
Bertsekas DP(undefined)undefined undefined undefined undefined-undefined
[50]
Parikh N(undefined)undefined undefined undefined undefined-undefined