Learning rates for Q-learning

被引：0

作者：

Even-Dar, E ^{[1
]}

Mansour, Y ^{[1
]}

机构：

[1] Tel Aviv Univ, Sch Comp Sci, IL-69978 Tel Aviv, Israel

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2003年 / 5卷

关键词：

Reinforcement Learning; Q-learning; stochastic processes; convergence bounds; learning rates;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper we derive convergence rates for Q-learning. We show an interesting relationship between the convergence rate and the learning rate used in Q-learning. For a polynomial learning rate, one which is 1/t(omega) at time t where omega epsilon ( 1/2, 1), we show that the convergence rate is polynomial in 1/(1-gamma), where g is the discount factor. In contrast we show that for a linear learning rate, one which is 1/t at time t, the convergence rate has an exponential dependence on 1/(1-gamma). In addition we show a simple example that proves this exponential behavior is inherent for linear learning rates.

引用

页码：1 / 25

页数：25

共 13 条

[1]

Azuma K., 1967, TOHOKU MATH J, V19, P357, DOI DOI 10.2748/TMJ/1178243286

[2]

Beleznay F., 1999, TR9902 MINDM LTD

[3] The ODE method for convergence of stochastic approximation and reinforcement learning [J].

Borkar, VS ;

Meyn, SP .

SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2000, 38 (02) :447-469

[4]

DIMITRI P, 1996, NEURODYNAMIC PROGRAM

[5]

JAAKKOLA T, 1994, NEURAL COMPUTATION, V6

[6]

Kearns M, 1999, ADV NEUR IN, V11, P996

[7]

LITTMAN ML, 1996, P 13 INT C MACH LEAR, P310

[8]

Puterman M.L., 2008, Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley Series in Probability and Statistics

[9]

Sutton R. S., 1998, Reinforcement Learning: An Introduction, V22447

[10]

Szepesvari C, 1998, ADV NEUR IN, V10, P1064

← 1 2 →