共 107 条
[1]
RL Competition, (2012)
[2]
Ahmadi M., Taylor M.E., Stone P., IFSA: Incremental feature-set augmentation for reinforcement learning tasks, International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp. 1-8, (2007)
[3]
Antos A., Munos R., Szepesvari C., Fitted Q-iteration in continuous action-space MDPs, Proceedings of Neural Information Processing Systems Conference (NIPS), (2007)
[4]
Antos A., Szepesvari C., Munos R., Learning near-optimal policies with bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, 71, 1, pp. 89-129, (2008)
[5]
Asmuth J., Li L., Littman M., Nouri A., Wingate D., A bayesian sampling approach to exploration in reinforcement learning, International Conference on Uncertainty in Artificial Intelligence (UAI), pp. 19-26, (2009)
[6]
Baird L.C., Residual algorithms: Reinforcement learning with function approximation, ICML, pp. 30-37, (1995)
[7]
Barreto A.D.M.S., Anderson C.W., Restricted gradient-descent algorithm for value-function approximation in reinforcement learning, Artificial Intelligence, 172, pp. 454-482, (2008)
[8]
Barto A., Duff M., Monte carlo matrix inversion and reinforcement learning, Neural Information Processing Systems (NIPS), pp. 687-694, (1994)
[9]
Barto A., Bradtke S., Singh S., Learning to act using real-time dynamic programming, Artificial Intelligence, 72, pp. 81-138, (1995)
[10]
Baxter J., Bartlett P., Direct gradient-based reinforcement learning, Circuits and Systems, (2000)