Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes

被引:0
|
作者
Vladislav B. Tadić
机构
[1] University of Sheffield,Department of Automatic Control and Systems Engineering
来源
Machine Learning | 2006年 / 63卷
关键词
Temporal-difference learning; Neuro-dynamic programming; Reinforcement learning; Stochastic approximation; Markov chains;
D O I
暂无
中图分类号
学科分类号
摘要
The mean-square asymptotic behavior of temporal-difference learning algorithms with constant step-sizes and linear function approximation is analyzed in this paper. The analysis is carried out for the case of discounted cost function associated with a Markov chain with a finite dimensional state-space. Under mild conditions, an upper bound for the asymptotic mean-square error of these algorithms is determined as a function of the step-size. Moreover, under the same assumptions, it is also shown that this bound is linear in the step size. The main results of the paper are illustrated with examples related to M/G/1 queues and nonlinear AR models with Markov switching.
引用
收藏
页码:107 / 133
页数:26
相关论文
共 50 条
  • [1] Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes
    Tadic, VB
    MACHINE LEARNING, 2006, 63 (02) : 107 - 133
  • [2] On the asymptotic behavior of a constant stepsize temporal-difference learning algorithm
    Tadic, A
    COMPUTATIONAL LEARNING THEORY, 1999, 1572 : 126 - 137
  • [3] On the worst-case analysis of temporal-difference learning algorithms
    Schapire, RE
    Warmuth, MK
    MACHINE LEARNING, 1996, 22 (1-3) : 95 - 121
  • [4] Stochastic Learning Under Random Reshuffling With Constant Step-Sizes
    Ying, Bicheng
    Yuan, Kun
    Vlaski, Stefan
    Sayed, Ali H.
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2019, 67 (02) : 474 - 489
  • [5] An Analysis of Quantile Temporal-Difference Learning
    Rowland, Mark
    Munos, Remi
    Azar, Mohammad Gheshlaghi
    Tang, Yunhao
    Ostrovski, Georg
    Harutyunyan, Anna
    Tuyls, Karl
    Bellemare, Marc G.
    Dabney, Will
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [6] A Non-asymptotic Analysis of Non-parametric Temporal-Difference Learning
    Berthier, Eloise
    Kobeissi, Ziad
    Bach, Francis
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [7] Temporal-Difference Learning for Online Reachability Analysis
    Akametalu, Anayo K.
    Tomlin, Claire J.
    2015 EUROPEAN CONTROL CONFERENCE (ECC), 2015, : 2508 - 2513
  • [8] An analysis of temporal-difference learning with function approximation
    Tsitsiklis, JN
    VanRoy, B
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1997, 42 (05) : 674 - 690
  • [9] Analysis of temporal-difference learning with function approximation
    Tsitsiklis, JN
    VanRoy, B
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 9: PROCEEDINGS OF THE 1996 CONFERENCE, 1997, 9 : 1075 - 1081
  • [10] On the mean-square rate of convergence of temporal-difference learning algorithms
    Tadic, VB
    PROCEEDINGS OF THE 2002 AMERICAN CONTROL CONFERENCE, VOLS 1-6, 2002, 1-6 : 1454 - 1459