Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes

被引:3
|
作者
Tadic, VB [1 ]
机构
[1] Univ Sheffield, Dept Automat Control & Syst Engn, Sheffield S1 3JD, S Yorkshire, England
关键词
temporal-difference learning; neuro-dynamic programming; reinforcement learning; stochastic approximation; Markov chains;
D O I
10.1007/s10994-006-5835-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The mean-square asymptotic behavior of temporal-difference learning algorithrns with constant step-sizes and linear function approximation is analyzed in this paper. The analysis is carried out for the case of discounted cost function associated with a Markov chain with a finite dimensional state-space. Under mild conditions, an upper bound for the asymptotic mean-square error of these algorithms is determined as a function of the step-size. Moreover, under the same assumptions, it is also shown that this bound is linear in the step size. The main results of the paper are illustrated with examples related to M/G/1 queues and nonlinear AR models with Markov switching.
引用
收藏
页码:107 / 133
页数:27
相关论文
共 50 条
  • [41] Asymptotic properties of two time-scale stochastic approximation algorithms with constant step sizes
    Tadic, VB
    Meyn, SP
    PROCEEDINGS OF THE 2003 AMERICAN CONTROL CONFERENCE, VOLS 1-6, 2003, : 4426 - 4431
  • [42] Adaptive combination of affine projection and NLMS algorithms based on variable step-sizes
    Ren, Chunhui
    Wang, Zuozhen
    Zhao, Zhiqin
    DIGITAL SIGNAL PROCESSING, 2016, 59 : 86 - 99
  • [43] Neural Temporal-Difference Learning Converges to Global Optima
    Cai, Qi
    Yang, Zhuoran
    Lee, Jason D.
    Wang, Zhaoran
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [44] Using temporal-difference learning for multi-agent bargaining
    Huang, Shiu-li
    Lin, Fu-ren
    ELECTRONIC COMMERCE RESEARCH AND APPLICATIONS, 2008, 7 (04) : 432 - 442
  • [45] Temporal-Difference Q-learning in Active Fault Diagnosis
    Skach, Jan
    Puncochar, Ivo
    Lewis, Frank L.
    2016 3RD CONFERENCE ON CONTROL AND FAULT-TOLERANT SYSTEMS (SYSTOL), 2016, : 287 - 292
  • [46] Temporal-Difference Learning An Online Support Vector Regression Approach
    Teixeira, Hugo Tanzarella
    Bottura, Celso Pascoli
    ICIMCO 2015 PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON INFORMATICS IN CONTROL, AUTOMATION AND ROBOTICS, VOL. 1, 2015, : 318 - 323
  • [47] Correlation minimizing replay memory in temporal-difference reinforcement learning
    Ramicic, Mirza
    Bonarinib, Andrea
    NEUROCOMPUTING, 2020, 393 : 91 - 100
  • [48] Implementing temporal-difference learning with the scaled conjugate gradient algorithm
    Falas, T
    Stafylopatis, A
    NEURAL PROCESSING LETTERS, 2005, 22 (03) : 361 - 375
  • [49] Implementing Temporal-Difference Learning with the Scaled Conjugate Gradient Algorithm
    Tasos Falas
    Andreas Stafylopatis
    Neural Processing Letters, 2005, 22 : 361 - 375
  • [50] Fuzzy interpretation for temporal-difference learning in anomaly detection problems
    Sukhanov, A. V.
    Kovalev, S. M.
    Styskala, V.
    BULLETIN OF THE POLISH ACADEMY OF SCIENCES-TECHNICAL SCIENCES, 2016, 64 (03) : 625 - 632