Model selection in reinforcement learning

被引:28
作者
Farahmand, Amir-massoud [1 ]
Szepesvari, Csaba [1 ]
机构
[1] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2E8, Canada
关键词
Reinforcement learning; Model selection; Complexity regularization; Adaptivity; Offline learning; Off-policy learning; Finite-sample bounds; POLICY ITERATION; PREDICTION;
D O I
10.1007/s10994-011-5254-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the problem of model selection in the batch (offline, non-interactive) reinforcement learning setting when the goal is to find an action-value function with the smallest Bellman error among a countable set of candidates functions. We propose a complexity regularization-based model selection algorithm, BERMIN, and prove that it enjoys an oracle-like property: the estimator's error differs from that of an oracle, who selects the candidate with the minimum Bellman error, by only a constant factor and a small remainder term that vanishes at a parametric rate as the number of samples increases. As an application, we consider a problem when the true action-value function belongs to an unknown member of a nested sequence of function spaces. We show that under some additional technical conditions BERMIN leads to a procedure whose rate of convergence, up to a constant factor, matches that of an oracle who knows which of the nested function spaces the true action-value function belongs to, i.e., the procedure achieves adaptivity.
引用
收藏
页码:299 / 332
页数:34
相关论文
共 41 条
  • [1] [Anonymous], J STAT PLANNING INFE
  • [2] [Anonymous], GEN ERROR BOUNDS STA
  • [3] [Anonymous], 2007, SPRINGER TEXTS STAT
  • [4] [Anonymous], 2005, P 22 INT C MACH LEAR, DOI DOI 10.1145/1102351.1102377
  • [5] [Anonymous], TICSP SERIES
  • [6] [Anonymous], 2003, J. Mach. Learn. Res.
  • [7] [Anonymous], 2009, Advances in Neural Information Processing Systems
  • [8] [Anonymous], 2009, MARKOV CHAINS STOCHA
  • [9] [Anonymous], 2009, ICML, DOI DOI 10.1145/1553374.1553442
  • [10] [Anonymous], 2006, Pattern recognition and machine learning