Technical Update: Least-Squares Temporal Difference Learning

被引:0
作者
Justin A. Boyan
机构
[1] ITA Software,
来源
Machine Learning | 2002年 / 49卷
关键词
reinforcement learning; temporal difference learning; value function approximation; linear least-squares methods;
D O I
暂无
中图分类号
学科分类号
摘要
TD.λ/ is a popular family of algorithms for approximate policy evaluation in large MDPs. TD.λ/ works by incrementally updating the value function after each observed transition. It has two major drawbacks: it may make inefficient use of data, and it requires the user to manually tune a stepsize schedule for good performance. For the case of linear value function approximations and λ = 0, the Least-Squares TD (LSTD) algorithm of Bradtke and Barto (1996, Machine learning, 22:1–3, 33–57) eliminates all stepsize parameters and improves data efficiency.
引用
收藏
页码:233 / 246
页数:13
相关论文
共 7 条
  • [1] Bradtke S. J.(1996)Linear least-squares algorithms for temporal difference learning Machine Learning 22 33-57
  • [2] Barto A. G.(1993)Prioritized sweeping: Reinforcement learning with less data and less time Machine Learning 13 103-130
  • [3] Moore A. W.(1994)TD-Gammon, a self-teaching backgammon program, achieves master-level play Neural Computation 6 215-219
  • [4] Atkeson C. G.(1997)An analysis of temporal-difference learning with function approximation IEEE Trans. Auto. Control 42 674-690
  • [5] Tesauro G.(undefined)undefined undefined undefined undefined-undefined
  • [6] Tsitsiklis J. N.(undefined)undefined undefined undefined undefined-undefined
  • [7] Van Roy B.(undefined)undefined undefined undefined undefined-undefined