共 17 条
- [1] Bhatnagar S, 2007, Adv. Neural Inf. Process. Syst., V20, P105
- [3] Technical update: Least-squares temporal difference learning [J]. MACHINE LEARNING, 2002, 49 (2-3) : 233 - 246
- [4] Bradtke SJ, 1996, MACH LEARN, V22, P33, DOI 10.1007/BF00114723
- [5] Geramifard A., 2006, P ADV NEUR INF PROC, P826
- [6] Geramifard A., 2006, Proceedings of the Twenty-First National Conference on Articial Intelligence (AAAI-06), V21, P356
- [8] On actor-critic algorithms [J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2003, 42 (04) : 1143 - 1166
- [9] Least-squares policy iteration [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2004, 4 (06) : 1107 - 1149
- [10] Ljung L., 1987, THEORY PRACTICE RECU