Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes

被引:3
|
作者
Tadic, VB [1 ]
机构
[1] Univ Sheffield, Dept Automat Control & Syst Engn, Sheffield S1 3JD, S Yorkshire, England
关键词
temporal-difference learning; neuro-dynamic programming; reinforcement learning; stochastic approximation; Markov chains;
D O I
10.1007/s10994-006-5835-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The mean-square asymptotic behavior of temporal-difference learning algorithrns with constant step-sizes and linear function approximation is analyzed in this paper. The analysis is carried out for the case of discounted cost function associated with a Markov chain with a finite dimensional state-space. Under mild conditions, an upper bound for the asymptotic mean-square error of these algorithms is determined as a function of the step-size. Moreover, under the same assumptions, it is also shown that this bound is linear in the step size. The main results of the paper are illustrated with examples related to M/G/1 queues and nonlinear AR models with Markov switching.
引用
收藏
页码:107 / 133
页数:27
相关论文
共 50 条
  • [21] On Generalized Bellman Equations and Temporal-Difference Learning
    Yu, Huizhen
    Mahmood, A. Rupam
    Sutton, Richard S.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2018, 19
  • [22] Gradient Temporal-Difference Learning with Regularized Corrections
    Ghiassian, Sina
    Patterson, Andrew
    Garg, Shivam
    Gupta, Dhawal
    White, Adam
    White, Martha
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [23] Relative Loss Bounds for Temporal-Difference Learning
    Jürgen Forster
    Manfred K. Warmuth
    Machine Learning, 2003, 51 : 23 - 50
  • [24] Nonlinear Distributional Gradient Temporal-Difference Learning
    Qu, Chao
    Mannor, Shie
    Xu, Huan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [25] Temporal-Difference Reinforcement Learning with Distributed Representations
    Kurth-Nelson, Zeb
    Redish, A. David
    PLOS ONE, 2009, 4 (10):
  • [26] Gradient Temporal-Difference Learning with Regularized Corrections
    Ghiassian, Sina
    Patterson, Andrew
    Garg, Shivam
    Gupta, Dhawal
    White, Adam
    White, Martha
    25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [27] Relative loss bounds for temporal-difference learning
    Forster, J
    Warmuth, MK
    MACHINE LEARNING, 2003, 51 (01) : 23 - 50
  • [28] Approximate value iteration and temporal-difference learning
    de Farias, DP
    Van Roy, B
    IEEE 2000 ADAPTIVE SYSTEMS FOR SIGNAL PROCESSING, COMMUNICATIONS, AND CONTROL SYMPOSIUM - PROCEEDINGS, 2000, : 48 - 51
  • [29] Target-Based Temporal-Difference Learning
    Lee, Donghwan
    He, Niao
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [30] New Versions of Gradient Temporal-Difference Learning
    Lee, Donghwan
    Lim, Han-Dong
    Park, Jihoon
    Choi, Okyong
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (08) : 5006 - 5013