Average cost temporal-difference learning

被引:0
作者
Tsitsiklis, JN [1 ]
Van Roy, B [1 ]
机构
[1] MIT, Informat & Decis Syst Lab, Cambridge, MA 02139 USA
来源
PROCEEDINGS OF THE 36TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-5 | 1997年
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We describe a variant of temporal-difference learning that approximates average and differential costs of an irreducible aperiodic Markov chain. Approximations are comprised of linear combinations of fixed basis functions whose weights are incrementally updated during a single endless trajectory of the Markov chain. We present results concerning convergence and the limit of convergence. We also provide a bound on the resulting approximation error that exhibits an interesting dependence on the "mixing time" of the Markov chain. The results parallel previous work by the authors, involving approximations of discounted cost-to-go.
引用
收藏
页码:498 / 502
页数:5
相关论文
共 50 条
  • [31] On the asymptotic behavior of a constant stepsize temporal-difference learning algorithm
    Tadic, A
    COMPUTATIONAL LEARNING THEORY, 1999, 1572 : 126 - 137
  • [32] Implementing Temporal-Difference Learning with the Scaled Conjugate Gradient Algorithm
    Tasos Falas
    Andreas Stafylopatis
    Neural Processing Letters, 2005, 22 : 361 - 375
  • [33] Fuzzy interpretation for temporal-difference learning in anomaly detection problems
    Sukhanov, A. V.
    Kovalev, S. M.
    Styskala, V.
    BULLETIN OF THE POLISH ACADEMY OF SCIENCES-TECHNICAL SCIENCES, 2016, 64 (03) : 625 - 632
  • [34] Fuzzy interpretation for temporal-difference learning in anomaly detection problems
    Sukhanov A.V.
    Kovalev S.M.
    Stýskala V.
    Sukhanov, A.V. (drewnia@rambler.ru), 1600, Polska Akademia Nauk (64): : 625 - 632
  • [35] Online Multi-Task Gradient Temporal-Difference Learning
    Sreenivasan, Vishnu Purushothaman
    Ammar, Haitham Bou
    Eaton, Eric
    PROCEEDINGS OF THE TWENTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2014, : 3136 - 3137
  • [36] Using temporal-difference learning for multi-agent bargaining
    Huang, Shiu-li
    Lin, Fu-ren
    ELECTRONIC COMMERCE RESEARCH AND APPLICATIONS, 2008, 7 (04) : 432 - 442
  • [37] Temporal-Difference Q-learning in Active Fault Diagnosis
    Skach, Jan
    Puncochar, Ivo
    Lewis, Frank L.
    2016 3RD CONFERENCE ON CONTROL AND FAULT-TOLERANT SYSTEMS (SYSTOL), 2016, : 287 - 292
  • [38] Temporal-Difference Learning An Online Support Vector Regression Approach
    Teixeira, Hugo Tanzarella
    Bottura, Celso Pascoli
    ICIMCO 2015 PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON INFORMATICS IN CONTROL, AUTOMATION AND ROBOTICS, VOL. 1, 2015, : 318 - 323
  • [39] Correlation minimizing replay memory in temporal-difference reinforcement learning
    Ramicic, Mirza
    Bonarinib, Andrea
    NEUROCOMPUTING, 2020, 393 : 91 - 100
  • [40] Implementing temporal-difference learning with the scaled conjugate gradient algorithm
    Falas, T
    Stafylopatis, A
    NEURAL PROCESSING LETTERS, 2005, 22 (03) : 361 - 375