Model selection in reinforcement learning

被引：28

作者：

Farahmand, Amir-massoud ^{[1
]}

Szepesvari, Csaba ^{[1
]}

机构：

[1] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2E8, Canada

来源：

MACHINE LEARNING | 2011年 / 85卷 / 03期

关键词：

Reinforcement learning; Model selection; Complexity regularization; Adaptivity; Offline learning; Off-policy learning; Finite-sample bounds; POLICY ITERATION; PREDICTION;

D O I：

10.1007/s10994-011-5254-7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider the problem of model selection in the batch (offline, non-interactive) reinforcement learning setting when the goal is to find an action-value function with the smallest Bellman error among a countable set of candidates functions. We propose a complexity regularization-based model selection algorithm, BERMIN, and prove that it enjoys an oracle-like property: the estimator's error differs from that of an oracle, who selects the candidate with the minimum Bellman error, by only a constant factor and a small remainder term that vanishes at a parametric rate as the number of samples increases. As an application, we consider a problem when the true action-value function belongs to an unknown member of a nested sequence of function spaces. We show that under some additional technical conditions BERMIN leads to a procedure whose rate of convergence, up to a constant factor, matches that of an oracle who knows which of the nested function spaces the true action-value function belongs to, i.e., the procedure achieves adaptivity.

引用

页码：299 / 332

页数：34

共 41 条

[1] [Anonymous], J STAT PLANNING INFE
[2] [Anonymous], GEN ERROR BOUNDS STA
[3] [Anonymous], 2007, SPRINGER TEXTS STAT
[4] [Anonymous], 2005, P 22 INT C MACH LEAR, DOI DOI 10.1145/1102351.1102377
[5] [Anonymous], TICSP SERIES
[6] [Anonymous], 2003, J. Mach. Learn. Res.
[7] [Anonymous], 2009, Advances in Neural Information Processing Systems
[8] [Anonymous], 2009, MARKOV CHAINS STOCHA
[9] [Anonymous], 2009, ICML, DOI DOI 10.1145/1553374.1553442
[10] [Anonymous], 2006, Pattern recognition and machine learning

← 1 2 3 4 5 →