共 107 条
[91]
Sutton R.S., Maei H.R., Precup D., Bhatnagar S., Silver D., Szepesvari C., Wiewiora E., Fast gradient-descent methods for temporal-difference learning with linear function approximation, International Conference on Machine Learning (ICML), pp. 993-1000, (2009)
[92]
Szepesvari C., Algorithms for reinforcement learning, Synthesis Lectures on Artificial Intelligence and Machine Learning, (2010)
[93]
Szita I., Szepesvari C., Model-based reinforcement learning with nearly tight exploration complexity bounds, International Conference on Machine Learning (ICML), pp. 1031-1038, (2010)
[94]
Taylor G., Parr R., Kernelized value function approximation for reinforcement learning, International Conference on Machine Learning (ICML), pp. 1017-1024, (2009)
[95]
Tsitsiklis J.N., Van Roy B., An analysis of temporal-difference learning with function approximation, IEEE Transactions on Automatic Control, 42, 5, pp. 674-690, (1997)
[96]
Tsitsiklis J.N., Van Roy B., Average cost temporal-difference learning, Automatica, 35, 11, pp. 1799-1808, (1999)
[97]
Ure N.K., Geramifard A., Chowdhary G., How P.J., Adaptive planning for markov decision processes with uncertain transition models via incremental feature dependency discovery, European Conference on Machine Learning (ECML), (2012)
[98]
Watkins C.J., Models of Delayed Reinforcement Learning, (1989)
[99]
Watkins C.J., Q-learning, Machine Learning, 8, 3, pp. 279-292, (1992)
[100]
Watkins C.J.C.H., Dayan P., Q-learning, Machine Learning, 8, 3, pp. 279-292, (1992)