Error bounds for constant step-size Q-learning

被引：48

作者：

Beck, C. L. ^{[1
,2
]}

Srikant, R. ^{[2
,3
]}

机构：

[1] Univ Illinois, Dept Ind & Enterprise Syst Engn, Urbana, IL USA

[2] Univ Illinois, Coordinated Sci Lab, Urbana, IL USA

[3] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL USA

来源：

SYSTEMS & CONTROL LETTERS | 2012年 / 61卷 / 12期

关键词：

Q-learning; Markov decision processes; Stochastic approximation; STOCHASTIC-APPROXIMATION;

D O I：

10.1016/j.sysconle.2012.08.014

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We provide a bound on the first moment of the error in the Q-function estimate resulting from fixed step-size algorithms applied to finite state-space, discounted reward Markov decision problems. Motivated by Tsitsiklis' proof for the decreasing step-size case, we decompose the Q-learning update equations into a dynamical system driven by a noise sequence and another dynamical system whose state variable is the Q-learning error, i.e., the difference between the true Q-function and the estimate. A natural persistence of excitation condition allows us to sample the system periodically and derive a simple scalar difference equation from which the convergence properties and bounds on the error of the Q-learning algorithm can be derived. (C) 2012 Elsevier B.V. All rights reserved.

引用

页码：1203 / 1208

页数：6

共 14 条

[1] Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming [J].

Bertsekas, Dimitri P. ;

Yu, Huizhen .

MATHEMATICS OF OPERATIONS RESEARCH, 2012, 37 (01) :66-94

[2]

Borkar V.S., 2000, 38 ALL C COMM CONTR

[3] The ODE method for convergence of stochastic approximation and reinforcement learning [J].

Borkar, VS ;

Meyn, SP .

SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2000, 38 (02) :447-469

[4]

Borkar VS., 2009, Stochastic Approximation: A Dynamical Systems Viewpoint

[5]

Busoniu L, 2010, AUTOM CONTROL ENG SE, P1, DOI 10.1201/9781439821091-f

[6]

Even-Dar E, 2003, J MACH LEARN RES, V5, P1

[7] Boundedness of iterates in Q-Learning [J].

Gosavi, A .

SYSTEMS & CONTROL LETTERS, 2006, 55 (04) :347-349

[8]

JAAKKOLA T, 1994, NEURAL COMPUTATION, V6

[9]

Kushner H. J., 1997, STOCHASTIC APPROXIMA

[10]

Levy K, 2006, WODES 2006: EIGHTH INTERNATIONAL WORKSHOP ON DISCRETE EVENT SYSTEMS, PROCEEDINGS, P372

← 1 2 →