Average cost temporal-difference learning

被引:0
|
作者
Tsitsiklis, JN [1 ]
Van Roy, B [1 ]
机构
[1] MIT, Informat & Decis Syst Lab, Cambridge, MA 02139 USA
来源
PROCEEDINGS OF THE 36TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-5 | 1997年
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We describe a variant of temporal-difference learning that approximates average and differential costs of an irreducible aperiodic Markov chain. Approximations are comprised of linear combinations of fixed basis functions whose weights are incrementally updated during a single endless trajectory of the Markov chain. We present results concerning convergence and the limit of convergence. We also provide a bound on the resulting approximation error that exhibits an interesting dependence on the "mixing time" of the Markov chain. The results parallel previous work by the authors, involving approximations of discounted cost-to-go.
引用
收藏
页码:498 / 502
页数:5
相关论文
共 50 条
  • [1] Average cost temporal-difference learning
    Lab. for Info. and Decision Systems, Massachusetts Inst. of Technology, Room 35-209, 77 Massachusetts Avenue, Cambridge, MA 02139-4307, United States
    Automatica, 11 (1799-1808):
  • [2] Average cost temporal-difference learning
    Tsitsiklis, JN
    Van Roy, B
    AUTOMATICA, 1999, 35 (11) : 1799 - 1808
  • [3] On Average Versus Discounted Reward Temporal-Difference Learning
    John N. Tsitsiklis
    Benjamin Van Roy
    Machine Learning, 2002, 49 : 179 - 191
  • [4] On average versus discounted reward temporal-difference learning
    Tsitsiklis, JN
    Van Roy, B
    MACHINE LEARNING, 2002, 49 (2-3) : 179 - 191
  • [5] Temporal-difference learning and applications in finance
    Van Roy, B
    COMPUTATIONAL FINANCE 1999, 2000, : 447 - 461
  • [6] True Online Temporal-Difference Learning
    van Seijen, Harm
    Mahmood, A. Rupam
    Pilarski, Patrick M.
    Machado, Marlos C.
    Sutton, Richard S.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [7] An Analysis of Quantile Temporal-Difference Learning
    Rowland, Mark
    Munos, Remi
    Azar, Mohammad Gheshlaghi
    Tang, Yunhao
    Ostrovski, Georg
    Harutyunyan, Anna
    Tuyls, Karl
    Bellemare, Marc G.
    Dabney, Will
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [8] Temporal-Difference Learning for Online Reachability Analysis
    Akametalu, Anayo K.
    Tomlin, Claire J.
    2015 EUROPEAN CONTROL CONFERENCE (ECC), 2015, : 2508 - 2513
  • [9] Advanced Temporal-Difference Learning for Intrusion Detection
    Sukhanov, A., V
    Kovalev, S. M.
    Styskala, V
    IFAC PAPERSONLINE, 2015, 48 (04): : 43 - 48
  • [10] Loosely Consistent Emphatic Temporal-Difference Learning
    He, Jiamin
    Che, Fengdi
    Wan, Yi
    Mahmood, A. Rupam
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 849 - 859