Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes

被引：3

作者：

Tadic, VB ^{[1
]}

机构：

[1] Univ Sheffield, Dept Automat Control & Syst Engn, Sheffield S1 3JD, S Yorkshire, England

来源：

MACHINE LEARNING | 2006年 / 63卷 / 02期

关键词：

temporal-difference learning; neuro-dynamic programming; reinforcement learning; stochastic approximation; Markov chains;

D O I：

10.1007/s10994-006-5835-z

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The mean-square asymptotic behavior of temporal-difference learning algorithrns with constant step-sizes and linear function approximation is analyzed in this paper. The analysis is carried out for the case of discounted cost function associated with a Markov chain with a finite dimensional state-space. Under mild conditions, an upper bound for the asymptotic mean-square error of these algorithms is determined as a function of the step-size. Moreover, under the same assumptions, it is also shown that this bound is linear in the step size. The main results of the paper are illustrated with examples related to M/G/1 queues and nonlinear AR models with Markov switching.

引用

页码：107 / 133

页数：27

共 50 条

[1] Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes
Vladislav B. Tadić
Machine Learning, 2006, 63 : 107 - 133
[2] On the asymptotic behavior of a constant stepsize temporal-difference learning algorithm
Tadic, A
COMPUTATIONAL LEARNING THEORY, 1999, 1572 : 126 - 137
[3] On the worst-case analysis of temporal-difference learning algorithms
Schapire, RE
Warmuth, MK
MACHINE LEARNING, 1996, 22 (1-3) : 95 - 121
[4] Stochastic Learning Under Random Reshuffling With Constant Step-Sizes
Ying, Bicheng
Yuan, Kun
Vlaski, Stefan
Sayed, Ali H.
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2019, 67 (02) : 474 - 489
[5] An Analysis of Quantile Temporal-Difference Learning
Rowland, Mark
Munos, Remi
Azar, Mohammad Gheshlaghi
Tang, Yunhao
Ostrovski, Georg
Harutyunyan, Anna
Tuyls, Karl
Bellemare, Marc G.
Dabney, Will
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
[6] A Non-asymptotic Analysis of Non-parametric Temporal-Difference Learning
Berthier, Eloise
Kobeissi, Ziad
Bach, Francis
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[7] Temporal-Difference Learning for Online Reachability Analysis
Akametalu, Anayo K.
Tomlin, Claire J.
2015 EUROPEAN CONTROL CONFERENCE (ECC), 2015, : 2508 - 2513
[8] An analysis of temporal-difference learning with function approximation
Tsitsiklis, JN
VanRoy, B
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1997, 42 (05) : 674 - 690
[9] Analysis of temporal-difference learning with function approximation
Tsitsiklis, JN
VanRoy, B
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 9: PROCEEDINGS OF THE 1996 CONFERENCE, 1997, 9 : 1075 - 1081
[10] On the mean-square rate of convergence of temporal-difference learning algorithms
Tadic, VB
PROCEEDINGS OF THE 2002 AMERICAN CONTROL CONFERENCE, VOLS 1-6, 2002, 1-6 : 1454 - 1459

← 1 2 3 4 5 →