共 32 条
[1]
Antos A.(2008)Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path Machine Learning 71 89-129
[2]
Szepesvári Cs.(2009)A survey of cross-validation procedures for model selection Statistics Surveys 4 40-79
[3]
Munos R.(2002)Model selection and error estimation Machine Learning 48 85-113
[4]
Arlot S.(2005)Local Rademacher complexities Annals of Statistics 33 1497-1537
[5]
Celisse A.(2005)Tree-based batch mode reinforcement learning Journal of Machine Learning Research 6 503-556
[6]
Bartlett P. L.(2003)Least-squares policy iteration Journal of Machine Learning Research 4 1107-1149
[7]
Boucheron S.(2004)Complexity regularization via localized random penalties Annals of Statistics 32 1679-1697
[8]
Lugosi G.(2000)Nonparametric time series prediction through adaptive model selection Machine Learning 39 5-34
[9]
Bartlett P. L.(2005)Basis function adaptation in temporal difference reinforcement learning Annals of Operation Research 134 215-238
[10]
Bousquet O.(1998)Memory-universal prediction of stationary random processes IEEE Transactions on Information Theory 44 117-133