共 7 条
- [1] Bradtke S. J.(1996)Linear least-squares algorithms for temporal difference learning Machine Learning 22 33-57
- [2] Barto A. G.(1993)Prioritized sweeping: Reinforcement learning with less data and less time Machine Learning 13 103-130
- [3] Moore A. W.(1994)TD-Gammon, a self-teaching backgammon program, achieves master-level play Neural Computation 6 215-219
- [4] Atkeson C. G.(1997)An analysis of temporal-difference learning with function approximation IEEE Trans. Auto. Control 42 674-690
- [5] Tesauro G.(undefined)undefined undefined undefined undefined-undefined
- [6] Tsitsiklis J. N.(undefined)undefined undefined undefined undefined-undefined
- [7] Van Roy B.(undefined)undefined undefined undefined undefined-undefined