Technical Update: Least-Squares Temporal Difference Learning

被引：0

作者：

Justin A. Boyan

机构：

[1] ITA Software,

来源：

Machine Learning | 2002年 / 49卷

关键词：

reinforcement learning; temporal difference learning; value function approximation; linear least-squares methods;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

TD.λ/ is a popular family of algorithms for approximate policy evaluation in large MDPs. TD.λ/ works by incrementally updating the value function after each observed transition. It has two major drawbacks: it may make inefficient use of data, and it requires the user to manually tune a stepsize schedule for good performance. For the case of linear value function approximations and λ = 0, the Least-Squares TD (LSTD) algorithm of Bradtke and Barto (1996, Machine learning, 22:1–3, 33–57) eliminates all stepsize parameters and improves data efficiency.

引用

页码：233 / 246

页数：13

共 7 条

[1] Bradtke S. J.(1996)Linear least-squares algorithms for temporal difference learning Machine Learning 22 33-57
[2] Barto A. G.(1993)Prioritized sweeping: Reinforcement learning with less data and less time Machine Learning 13 103-130
[3] Moore A. W.(1994)TD-Gammon, a self-teaching backgammon program, achieves master-level play Neural Computation 6 215-219
[4] Atkeson C. G.(1997)An analysis of temporal-difference learning with function approximation IEEE Trans. Auto. Control 42 674-690
[5] Tesauro G.(undefined)undefined undefined undefined undefined-undefined
[6] Tsitsiklis J. N.(undefined)undefined undefined undefined undefined-undefined
[7] Van Roy B.(undefined)undefined undefined undefined undefined-undefined

← 1 →