Sparse Kernel-Based Least Squares Temporal Difference with Prioritized Sweeping

被引:0
|
作者
Sun, Cijia [1 ]
Ling, Xinghong [1 ]
Fu, Yuchen [1 ]
Liu, Quan [1 ]
Zhu, Haijun [1 ]
Zhai, Jianwei [1 ]
Zhang, Peng [1 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215000, Peoples R China
来源
NEURAL INFORMATION PROCESSING, ICONIP 2016, PT III | 2016年 / 9949卷
关键词
Reinforcement learning; Prioritized sweeping; Sparse kernel; Least squares temporal difference; POLICY ITERATION;
D O I
10.1007/978-3-319-46675-0_25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
How to improve the efficiency of the algorithms to solve the large scale or continuous space reinforcement learning (RL) problems has been a hot research. Kernel-based least squares temporal difference(KLSTD) algorithm can solve continuous space RL problems. But it has the problem of high computational complexity because of kernel-based and complex matrix computation. For the problem, this paper proposes an algorithm named sparse kernel-based least squares temporal difference with prioritized sweeping (PS-SKLSTD). PS-SKLSTD consists of two parts: learning and planning. In the learning process, we exploit the ALD-based sparse kernel function to represent value function and update the parameter vectors based on the Sherman-Morrison equation. In the planning process, we use prioritized sweeping method to select the current updated state-action pair. The experimental results demonstrate that PS-SKLSTD has better performance on convergence and calculation efficiency than KLSTD.
引用
收藏
页码:221 / 230
页数:10
相关论文
共 19 条
  • [1] Online Selective Kernel-Based Temporal Difference Learning
    Chen, Xingguo
    Gao, Yang
    Wang, Ruili
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2013, 24 (12) : 1944 - 1956
  • [2] Adaptive Kernel-Width Selection for Kernel-Based Least-Squares Policy Iteration Algorithm
    Wu, Jun
    Xu, Xin
    Zuo, Lei
    Li, Zhaobin
    Wang, Jian
    ADVANCES IN NEURAL NETWORKS - ISNN 2011, PT II, 2011, 6676 : 611 - 619
  • [3] OPTIMAL POLICY EVALUATION USING KERNEL-BASED TEMPORAL DIFFERENCE METHODS
    Duan, Yaqi
    Wang, Mengdi
    Wainwright, Martin j.
    ANNALS OF STATISTICS, 2024, 52 (05) : 1927 - 1952
  • [4] Self-Learning Cruise Control Using Kernel-Based Least Squares Policy Iteration
    Wang, Jian
    Xu, Xin
    Liu, Daxue
    Sun, Zhenping
    Chen, Qingyang
    IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2014, 22 (03) : 1078 - 1087
  • [5] Online Attentive Kernel-Based Off-Policy Temporal Difference Learning
    Yang, Shangdong
    Zhang, Shuaiqiang
    Chen, Xingguo
    APPLIED SCIENCES-BASEL, 2024, 14 (23):
  • [6] Least-squares temporal difference learning based on an extreme learning machine
    Escandell-Montero, Pablo
    Martinez-Martinez, Jose M.
    Martin-Guerrero, Jose D.
    Soria-Olivas, Emilio
    Gomez-Sanchis, Juan
    NEUROCOMPUTING, 2014, 141 : 37 - 45
  • [7] Technical Update: Least-Squares Temporal Difference Learning
    Justin A. Boyan
    Machine Learning, 2002, 49 : 233 - 246
  • [8] Recursive Least-Squares Temporal Difference With Gradient Correction
    Song, Tianheng
    Li, Dazi
    Yang, Weimin
    Hirasawa, Kotaro
    IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (08) : 4251 - 4264
  • [9] Technical update: Least-squares temporal difference learning
    Boyan, JA
    MACHINE LEARNING, 2002, 49 (2-3) : 233 - 246
  • [10] Linear least-squares algorithms for temporal difference learning
    Bradtke, SJ
    Barto, AG
    MACHINE LEARNING, 1996, 22 (1-3) : 33 - 57