Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes

被引:3
|
作者
Tadic, VB [1 ]
机构
[1] Univ Sheffield, Dept Automat Control & Syst Engn, Sheffield S1 3JD, S Yorkshire, England
关键词
temporal-difference learning; neuro-dynamic programming; reinforcement learning; stochastic approximation; Markov chains;
D O I
10.1007/s10994-006-5835-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The mean-square asymptotic behavior of temporal-difference learning algorithrns with constant step-sizes and linear function approximation is analyzed in this paper. The analysis is carried out for the case of discounted cost function associated with a Markov chain with a finite dimensional state-space. Under mild conditions, an upper bound for the asymptotic mean-square error of these algorithms is determined as a function of the step-size. Moreover, under the same assumptions, it is also shown that this bound is linear in the step size. The main results of the paper are illustrated with examples related to M/G/1 queues and nonlinear AR models with Markov switching.
引用
收藏
页码:107 / 133
页数:27
相关论文
共 50 条
  • [31] On Generalized Bellman Equations and Temporal-Difference Learning
    Yu, Huizhen
    Mahmood, Ashique Rupam
    Sutton, Richard S.
    ADVANCES IN ARTIFICIAL INTELLIGENCE, CANADIAN AI 2017, 2017, 10233 : 3 - 14
  • [32] Postponed Updates for Temporal-Difference Reinforcement Learning
    van Seijen, Harm
    Whiteson, Shimon
    2009 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, 2009, : 665 - +
  • [33] Weak Convergence Properties of Constrained Emphatic Temporal-difference Learning with Constant and Slowly Diminishing Stepsize
    Yu, Huizhen
    JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [34] On the convergence of temporal-difference learning with linear function approximation
    Tadic, V
    MACHINE LEARNING, 2001, 42 (03) : 241 - 267
  • [35] Acceleration of Frank-Wolfe Algorithms with Open-Loop Step-Sizes
    Wirth, Elias
    Kerdreux, Thomas
    Pokutta, Sebastian
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206 : 77 - 100
  • [36] On average versus discounted reward temporal-difference learning
    Tsitsiklis, JN
    Van Roy, B
    MACHINE LEARNING, 2002, 49 (2-3) : 179 - 191
  • [37] Optimal Active Fault Diagnosis by Temporal-Difference Learning
    Skach, Jan
    Puncochar, Ivo
    Lewis, Frank L.
    2016 IEEE 55TH CONFERENCE ON DECISION AND CONTROL (CDC), 2016, : 2146 - 2151
  • [38] On Average Versus Discounted Reward Temporal-Difference Learning
    John N. Tsitsiklis
    Benjamin Van Roy
    Machine Learning, 2002, 49 : 179 - 191
  • [39] Temporal-Difference Learning with Sampling Baseline for Image Captioning
    Chen, Hui
    Ding, Guiguang
    Zhao, Sicheng
    Han, Jungong
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 6706 - 6713
  • [40] On the Convergence of Temporal-Difference Learning with Linear Function Approximation
    Vladislav Tadić
    Machine Learning, 2001, 42 : 241 - 267