Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes

被引：3

作者：

Tadic, VB ^{[1
]}

机构：

[1] Univ Sheffield, Dept Automat Control & Syst Engn, Sheffield S1 3JD, S Yorkshire, England

来源：

MACHINE LEARNING | 2006年 / 63卷 / 02期

关键词：

temporal-difference learning; neuro-dynamic programming; reinforcement learning; stochastic approximation; Markov chains;

D O I：

10.1007/s10994-006-5835-z

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The mean-square asymptotic behavior of temporal-difference learning algorithrns with constant step-sizes and linear function approximation is analyzed in this paper. The analysis is carried out for the case of discounted cost function associated with a Markov chain with a finite dimensional state-space. Under mild conditions, an upper bound for the asymptotic mean-square error of these algorithms is determined as a function of the step-size. Moreover, under the same assumptions, it is also shown that this bound is linear in the step size. The main results of the paper are illustrated with examples related to M/G/1 queues and nonlinear AR models with Markov switching.

引用

页码：107 / 133

页数：27

共 50 条

[41] Asymptotic properties of two time-scale stochastic approximation algorithms with constant step sizes
Tadic, VB
Meyn, SP
PROCEEDINGS OF THE 2003 AMERICAN CONTROL CONFERENCE, VOLS 1-6, 2003, : 4426 - 4431
[42] Adaptive combination of affine projection and NLMS algorithms based on variable step-sizes
Ren, Chunhui
Wang, Zuozhen
Zhao, Zhiqin
DIGITAL SIGNAL PROCESSING, 2016, 59 : 86 - 99
[43] Neural Temporal-Difference Learning Converges to Global Optima
Cai, Qi
Yang, Zhuoran
Lee, Jason D.
Wang, Zhaoran
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[44] Using temporal-difference learning for multi-agent bargaining
Huang, Shiu-li
Lin, Fu-ren
ELECTRONIC COMMERCE RESEARCH AND APPLICATIONS, 2008, 7 (04) : 432 - 442
[45] Temporal-Difference Q-learning in Active Fault Diagnosis
Skach, Jan
Puncochar, Ivo
Lewis, Frank L.
2016 3RD CONFERENCE ON CONTROL AND FAULT-TOLERANT SYSTEMS (SYSTOL), 2016, : 287 - 292
[46] Temporal-Difference Learning An Online Support Vector Regression Approach
Teixeira, Hugo Tanzarella
Bottura, Celso Pascoli
ICIMCO 2015 PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON INFORMATICS IN CONTROL, AUTOMATION AND ROBOTICS, VOL. 1, 2015, : 318 - 323
[47] Correlation minimizing replay memory in temporal-difference reinforcement learning
Ramicic, Mirza
Bonarinib, Andrea
NEUROCOMPUTING, 2020, 393 : 91 - 100
[48] Implementing temporal-difference learning with the scaled conjugate gradient algorithm
Falas, T
Stafylopatis, A
NEURAL PROCESSING LETTERS, 2005, 22 (03) : 361 - 375
[49] Implementing Temporal-Difference Learning with the Scaled Conjugate Gradient Algorithm
Tasos Falas
Andreas Stafylopatis
Neural Processing Letters, 2005, 22 : 361 - 375
[50] Fuzzy interpretation for temporal-difference learning in anomaly detection problems
Sukhanov, A. V.
Kovalev, S. M.
Styskala, V.
BULLETIN OF THE POLISH ACADEMY OF SCIENCES-TECHNICAL SCIENCES, 2016, 64 (03) : 625 - 632

← 1 2 3 4 5 →