共 113 条
[1]
Abtahi F., 2011, P IEEE ICDL EPIROB F
[3]
[Anonymous], 2006, AAAI
[4]
[Anonymous], 2005, P 22 INT C MACH LEAR, DOI DOI 10.1145/1102351.1102421
[5]
Antoniou A., 2007, PROC NEURAL INF PROC, P1, DOI 10.1007/978-0-387-71107-2_1
[7]
Value-iteration based fitted policy iteration:: Learning with a single trajectory
[J].
2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING,
2007,
:330-+
[10]
Bellman R., 1957, DYNAMIC PROGRAMMING