Learning rates for Q-learning

被引:0
作者
Even-Dar, E [1 ]
Mansour, Y [1 ]
机构
[1] Tel Aviv Univ, Sch Comp Sci, IL-69978 Tel Aviv, Israel
关键词
Reinforcement Learning; Q-learning; stochastic processes; convergence bounds; learning rates;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we derive convergence rates for Q-learning. We show an interesting relationship between the convergence rate and the learning rate used in Q-learning. For a polynomial learning rate, one which is 1/t(omega) at time t where omega epsilon ( 1/2, 1), we show that the convergence rate is polynomial in 1/(1-gamma), where g is the discount factor. In contrast we show that for a linear learning rate, one which is 1/t at time t, the convergence rate has an exponential dependence on 1/(1-gamma). In addition we show a simple example that proves this exponential behavior is inherent for linear learning rates.
引用
收藏
页码:1 / 25
页数:25
相关论文
共 13 条
[1]  
Azuma K., 1967, TOHOKU MATH J, V19, P357, DOI DOI 10.2748/TMJ/1178243286
[2]  
Beleznay F., 1999, TR9902 MINDM LTD
[3]   The ODE method for convergence of stochastic approximation and reinforcement learning [J].
Borkar, VS ;
Meyn, SP .
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2000, 38 (02) :447-469
[4]  
DIMITRI P, 1996, NEURODYNAMIC PROGRAM
[5]  
JAAKKOLA T, 1994, NEURAL COMPUTATION, V6
[6]  
Kearns M, 1999, ADV NEUR IN, V11, P996
[7]  
LITTMAN ML, 1996, P 13 INT C MACH LEAR, P310
[8]  
Puterman M.L., 2008, Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley Series in Probability and Statistics
[9]  
Sutton R. S., 1998, Reinforcement Learning: An Introduction, V22447
[10]  
Szepesvari C, 1998, ADV NEUR IN, V10, P1064