Error bounds for constant step-size Q-learning

被引:48
作者
Beck, C. L. [1 ,2 ]
Srikant, R. [2 ,3 ]
机构
[1] Univ Illinois, Dept Ind & Enterprise Syst Engn, Urbana, IL USA
[2] Univ Illinois, Coordinated Sci Lab, Urbana, IL USA
[3] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL USA
关键词
Q-learning; Markov decision processes; Stochastic approximation; STOCHASTIC-APPROXIMATION;
D O I
10.1016/j.sysconle.2012.08.014
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We provide a bound on the first moment of the error in the Q-function estimate resulting from fixed step-size algorithms applied to finite state-space, discounted reward Markov decision problems. Motivated by Tsitsiklis' proof for the decreasing step-size case, we decompose the Q-learning update equations into a dynamical system driven by a noise sequence and another dynamical system whose state variable is the Q-learning error, i.e., the difference between the true Q-function and the estimate. A natural persistence of excitation condition allows us to sample the system periodically and derive a simple scalar difference equation from which the convergence properties and bounds on the error of the Q-learning algorithm can be derived. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:1203 / 1208
页数:6
相关论文
共 14 条
[1]   Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming [J].
Bertsekas, Dimitri P. ;
Yu, Huizhen .
MATHEMATICS OF OPERATIONS RESEARCH, 2012, 37 (01) :66-94
[2]  
Borkar V.S., 2000, 38 ALL C COMM CONTR
[3]   The ODE method for convergence of stochastic approximation and reinforcement learning [J].
Borkar, VS ;
Meyn, SP .
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2000, 38 (02) :447-469
[4]  
Borkar VS., 2009, Stochastic Approximation: A Dynamical Systems Viewpoint
[5]  
Busoniu L, 2010, AUTOM CONTROL ENG SE, P1, DOI 10.1201/9781439821091-f
[6]  
Even-Dar E, 2003, J MACH LEARN RES, V5, P1
[7]   Boundedness of iterates in Q-Learning [J].
Gosavi, A .
SYSTEMS & CONTROL LETTERS, 2006, 55 (04) :347-349
[8]  
JAAKKOLA T, 1994, NEURAL COMPUTATION, V6
[9]  
Kushner H. J., 1997, STOCHASTIC APPROXIMA
[10]  
Levy K, 2006, WODES 2006: EIGHTH INTERNATIONAL WORKSHOP ON DISCRETE EVENT SYSTEMS, PROCEEDINGS, P372