Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes

被引:3
|
作者
Tadic, VB [1 ]
机构
[1] Univ Sheffield, Dept Automat Control & Syst Engn, Sheffield S1 3JD, S Yorkshire, England
关键词
temporal-difference learning; neuro-dynamic programming; reinforcement learning; stochastic approximation; Markov chains;
D O I
10.1007/s10994-006-5835-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The mean-square asymptotic behavior of temporal-difference learning algorithrns with constant step-sizes and linear function approximation is analyzed in this paper. The analysis is carried out for the case of discounted cost function associated with a Markov chain with a finite dimensional state-space. Under mild conditions, an upper bound for the asymptotic mean-square error of these algorithms is determined as a function of the step-size. Moreover, under the same assumptions, it is also shown that this bound is linear in the step size. The main results of the paper are illustrated with examples related to M/G/1 queues and nonlinear AR models with Markov switching.
引用
收藏
页码:107 / 133
页数:27
相关论文
共 50 条
  • [1] Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes
    Vladislav B. Tadić
    Machine Learning, 2006, 63 : 107 - 133
  • [2] On the asymptotic behavior of a constant stepsize temporal-difference learning algorithm
    Tadic, A
    COMPUTATIONAL LEARNING THEORY, 1999, 1572 : 126 - 137
  • [3] On the worst-case analysis of temporal-difference learning algorithms
    Schapire, RE
    Warmuth, MK
    MACHINE LEARNING, 1996, 22 (1-3) : 95 - 121
  • [4] Stochastic Learning Under Random Reshuffling With Constant Step-Sizes
    Ying, Bicheng
    Yuan, Kun
    Vlaski, Stefan
    Sayed, Ali H.
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2019, 67 (02) : 474 - 489
  • [5] An Analysis of Quantile Temporal-Difference Learning
    Rowland, Mark
    Munos, Remi
    Azar, Mohammad Gheshlaghi
    Tang, Yunhao
    Ostrovski, Georg
    Harutyunyan, Anna
    Tuyls, Karl
    Bellemare, Marc G.
    Dabney, Will
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [6] A Non-asymptotic Analysis of Non-parametric Temporal-Difference Learning
    Berthier, Eloise
    Kobeissi, Ziad
    Bach, Francis
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [7] Temporal-Difference Learning for Online Reachability Analysis
    Akametalu, Anayo K.
    Tomlin, Claire J.
    2015 EUROPEAN CONTROL CONFERENCE (ECC), 2015, : 2508 - 2513
  • [8] An analysis of temporal-difference learning with function approximation
    Tsitsiklis, JN
    VanRoy, B
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1997, 42 (05) : 674 - 690
  • [9] Analysis of temporal-difference learning with function approximation
    Tsitsiklis, JN
    VanRoy, B
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 9: PROCEEDINGS OF THE 1996 CONFERENCE, 1997, 9 : 1075 - 1081
  • [10] On the mean-square rate of convergence of temporal-difference learning algorithms
    Tadic, VB
    PROCEEDINGS OF THE 2002 AMERICAN CONTROL CONFERENCE, VOLS 1-6, 2002, 1-6 : 1454 - 1459