Bias and variance approximation in value function estimates

被引:92
|
作者
Mannor, Shie [1 ]
Simester, Duncan
Sun, Peng
Tsitsiklis, John N.
机构
[1] McGill Univ, Dept Elect & Comp Engn, Montreal, PQ H3A 2A7, Canada
[2] MIT, Sloan Sch Management, Cambridge, MA 02139 USA
[3] Duke Univ, Fuqua Sch Business, Durham, NC 27708 USA
[4] MIT, Informat & Decis Syst Lab, Cambridge, MA 02139 USA
关键词
value function; confidence interval; variance; bias; DYNAMIC-PROGRAMMING MODELS; MARKOV DECISION-PROCESSES; MANAGEMENT;
D O I
10.1287/mnsc.1060.0614
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
We consider a finite-state, finite-action, infinite-horizon, discounted reward Markov decision process and study the bias and variance in the value function estimates that result from empirical estimates of the model parameters. We provide closed-form approximations for the bias and variance, which can then be used to derive confidence intervals around the value function estimates. We illustrate and validate our findings using a large database describing the transaction and mailing histories for customers of a mail-order catalog firm.
引用
收藏
页码:308 / 322
页数:15
相关论文
共 50 条