Model-based least-squares policy evaluation

被引:0
|
作者
Lu, F [1 ]
Schuurmans, D [1 ]
机构
[1] Univ Waterloo, Sch Comp Sci, Waterloo, ON N2L 3G1, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A popular form of policy evaluation for large Markov Decision Processes (MDPs) is the least-squares temporal differencing (TD) method. Least-squares TD methods handle large MDPs by requiring prior knowledge feature vectors which form a set of basis vectors that compress the system down to tractable levels. Model-based methods have largely been ignored in favour of model-free TD algorithms due to two perceived drawbacks: slower computation time and larger storage requirements. This paper challenges the perceived advantage of the temporal difference method over a model-based method in three distinct ways. First, it provides a new model-based approximate policy estimation method which produces solutions in a faster computation time than Boyan's least-squares TD method. Second, it introduces a new algorithm to derive basis vectors without any prior knowledge of the system. Third, we introduce an iteratively improving model-based value estimator that can run faster than standard TD methods. All algorithms require model storage but remain computationally competitive in terms of accuracy with model-free temporal differencing methods.
引用
收藏
页码:342 / 352
页数:11
相关论文
共 50 条
  • [1] Least-squares model-based halftoning
    Pappas, TN
    Neuhoff, DL
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 1999, 8 (08) : 1102 - 1116
  • [2] One-dimensional least-squares model-based halftoning
    Neuhoff, DL
    Pappas, TN
    Seshadri, N
    JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS IMAGE SCIENCE AND VISION, 1997, 14 (08) : 1707 - 1723
  • [3] Model-based policy gradients with parameter-based exploration by least-squares conditional density estimation
    Tangkaratt, Voot
    Mod, Syogo
    Zhao, Tingting
    Morimoto, Jun
    Sugiyama, Masashi
    NEURAL NETWORKS, 2014, 57 : 128 - 140
  • [4] Model-based image reconstruction by means of a constrained least-squares solution
    Roggemann, Michael C.
    Tyler, David W.
    1997, (36):
  • [5] Model-based image reconstruction by means of a constrained least-squares solution
    Roggemann, MC
    Tyler, DW
    APPLIED OPTICS, 1997, 36 (11): : 2360 - 2369
  • [6] Hybrid Least-Squares Algorithms for Approximate Policy Evaluation
    Johns, Jeff
    Petrik, Marek
    Mahadevan, Sridhar
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT I, 2009, 5781 : 9 - 9
  • [7] Hybrid least-squares algorithms for approximate policy evaluation
    Johns, Jeff
    Petrik, Marek
    Mahadevan, Sridhar
    MACHINE LEARNING, 2009, 76 (2-3) : 243 - 256
  • [8] Hybrid least-squares algorithms for approximate policy evaluation
    Jeff Johns
    Marek Petrik
    Sridhar Mahadevan
    Machine Learning, 2009, 76 : 243 - 256
  • [9] Least-squares policy iteration
    Lagoudakis, MG
    Parr, R
    JOURNAL OF MACHINE LEARNING RESEARCH, 2004, 4 (06) : 1107 - 1149
  • [10] LEAST-SQUARES EVALUATION OF LINEARITY
    GINDLER, EM
    CLINICAL CHEMISTRY, 1979, 25 (02) : 337 - 337