Model-based least-squares policy evaluation

被引:0
|
作者
Lu, F [1 ]
Schuurmans, D [1 ]
机构
[1] Univ Waterloo, Sch Comp Sci, Waterloo, ON N2L 3G1, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A popular form of policy evaluation for large Markov Decision Processes (MDPs) is the least-squares temporal differencing (TD) method. Least-squares TD methods handle large MDPs by requiring prior knowledge feature vectors which form a set of basis vectors that compress the system down to tractable levels. Model-based methods have largely been ignored in favour of model-free TD algorithms due to two perceived drawbacks: slower computation time and larger storage requirements. This paper challenges the perceived advantage of the temporal difference method over a model-based method in three distinct ways. First, it provides a new model-based approximate policy estimation method which produces solutions in a faster computation time than Boyan's least-squares TD method. Second, it introduces a new algorithm to derive basis vectors without any prior knowledge of the system. Third, we introduce an iteratively improving model-based value estimator that can run faster than standard TD methods. All algorithms require model storage but remain computationally competitive in terms of accuracy with model-free temporal differencing methods.
引用
收藏
页码:342 / 352
页数:11
相关论文
共 50 条
  • [11] Parallel Least-Squares Policy Iteration
    Wang, Jun-Kun
    Lin, Shou-De
    PROCEEDINGS OF 3RD IEEE/ACM INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS, (DSAA 2016), 2016, : 166 - 173
  • [12] Model-based material decomposition with a penalized nonlinear least-squares CT reconstruction algorithm
    Tilley, Steven, II
    Zbijewski, Wojciech
    Stayman, J. Webster
    PHYSICS IN MEDICINE AND BIOLOGY, 2019, 64 (03):
  • [13] Model-based estimation of cylinder pressure sensor offset using least-squares methods
    Tunestål, P
    Hedrick, JK
    Johansson, R
    PROCEEDINGS OF THE 40TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-5, 2001, : 3740 - 3745
  • [14] Model-based least squares optimal interpolation
    Gilman, A.
    Bailey, D. G.
    Marsland, S.
    2009 24TH INTERNATIONAL CONFERENCE IMAGE AND VISION COMPUTING NEW ZEALAND (IVCNZ 2009), 2009, : 124 - 129
  • [15] The autocovariance least-squares method for estimating covariances: Application to model-based control of chemical reactors
    Odelson, Brian J.
    Lutz, Alexander
    Rawlings, James B.
    IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2006, 14 (03) : 532 - 540
  • [16] Auxiliary model-based least-squares identification methods for Hammerstein output-error systems
    Ding, Feng
    Shi, Yang
    Chen, Tongwen
    SYSTEMS & CONTROL LETTERS, 2007, 56 (05) : 373 - 380
  • [17] DANGERS OF EVALUATION BY LEAST-SQUARES METHOD
    FOURASTIE, J
    REVUE D ECONOMIE POLITIQUE, 1976, 86 (03): : 450 - 462
  • [18] Experience replay for least-squares policy iteration
    Liu, Quan (quanliu@suda.edu.cn), 1600, Institute of Electrical and Electronics Engineers Inc. (01):
  • [19] Experience Replay for Least-Squares Policy Iteration
    Quan Liu
    Xin Zhou
    Fei Zhu
    Qiming Fu
    Yuchen Fu
    IEEE/CAAJournalofAutomaticaSinica, 2014, 1 (03) : 274 - 281
  • [20] WHEN LEAST-SQUARES SQUARES LEAST
    ALCHALABI, M
    GEOPHYSICAL PROSPECTING, 1992, 40 (03) : 359 - 378