Negative binomial sums of random variables and discounted reward processes

被引:0
|
作者
Cooper, WL [1 ]
机构
[1] Georgia Inst Technol, Sch Ind & Syst Engn, Atlanta, GA 30332 USA
关键词
sums of random variables; reward processes; Markov decision processes;
D O I
10.1017/S0021900200016247
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Given a sequence of random variables (rewards), the Haviv-Puterman differential equation relates the expected infinite-horizon lambda-discounted reward and the expected total reward up to a random time that is determined by an independent negative binomial random variable with parameters 2 and lambda. This paper provides an interpretation of this proven, but previously unexplained, result. Furthermore, the interpretation is formalized into a new proof, which then yields new results for the general case where the rewards are accumulated up to a time determined by an independent negative binomial random variable with parameters k and lambda.
引用
收藏
页码:589 / 599
页数:11
相关论文
共 50 条