Sparse Kernel-Based Least Squares Temporal Difference with Prioritized Sweeping

被引：0

作者：

Sun, Cijia ^{[1
]}

Ling, Xinghong ^{[1
]}

Fu, Yuchen ^{[1
]}

Liu, Quan ^{[1
]}

Zhu, Haijun ^{[1
]}

Zhai, Jianwei ^{[1
]}

Zhang, Peng ^{[1
]}

机构：

[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215000, Peoples R China

来源：

NEURAL INFORMATION PROCESSING, ICONIP 2016, PT III | 2016年 / 9949卷

关键词：

Reinforcement learning; Prioritized sweeping; Sparse kernel; Least squares temporal difference; POLICY ITERATION;

D O I：

10.1007/978-3-319-46675-0_25

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

How to improve the efficiency of the algorithms to solve the large scale or continuous space reinforcement learning (RL) problems has been a hot research. Kernel-based least squares temporal difference(KLSTD) algorithm can solve continuous space RL problems. But it has the problem of high computational complexity because of kernel-based and complex matrix computation. For the problem, this paper proposes an algorithm named sparse kernel-based least squares temporal difference with prioritized sweeping (PS-SKLSTD). PS-SKLSTD consists of two parts: learning and planning. In the learning process, we exploit the ALD-based sparse kernel function to represent value function and update the parameter vectors based on the Sherman-Morrison equation. In the planning process, we use prioritized sweeping method to select the current updated state-action pair. The experimental results demonstrate that PS-SKLSTD has better performance on convergence and calculation efficiency than KLSTD.

引用

页码：221 / 230

页数：10

共 19 条

[1] Online Selective Kernel-Based Temporal Difference Learning
Chen, Xingguo
Gao, Yang
Wang, Ruili
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2013, 24 (12) : 1944 - 1956
[2] Adaptive Kernel-Width Selection for Kernel-Based Least-Squares Policy Iteration Algorithm
Wu, Jun
Xu, Xin
Zuo, Lei
Li, Zhaobin
Wang, Jian
ADVANCES IN NEURAL NETWORKS - ISNN 2011, PT II, 2011, 6676 : 611 - 619
[3] OPTIMAL POLICY EVALUATION USING KERNEL-BASED TEMPORAL DIFFERENCE METHODS
Duan, Yaqi
Wang, Mengdi
Wainwright, Martin j.
ANNALS OF STATISTICS, 2024, 52 (05) : 1927 - 1952
[4] Self-Learning Cruise Control Using Kernel-Based Least Squares Policy Iteration
Wang, Jian
Xu, Xin
Liu, Daxue
Sun, Zhenping
Chen, Qingyang
IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2014, 22 (03) : 1078 - 1087
[5] Online Attentive Kernel-Based Off-Policy Temporal Difference Learning
Yang, Shangdong
Zhang, Shuaiqiang
Chen, Xingguo
APPLIED SCIENCES-BASEL, 2024, 14 (23):
[6] Least-squares temporal difference learning based on an extreme learning machine
Escandell-Montero, Pablo
Martinez-Martinez, Jose M.
Martin-Guerrero, Jose D.
Soria-Olivas, Emilio
Gomez-Sanchis, Juan
NEUROCOMPUTING, 2014, 141 : 37 - 45
[7] Technical Update: Least-Squares Temporal Difference Learning
Justin A. Boyan
Machine Learning, 2002, 49 : 233 - 246
[8] Recursive Least-Squares Temporal Difference With Gradient Correction
Song, Tianheng
Li, Dazi
Yang, Weimin
Hirasawa, Kotaro
IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (08) : 4251 - 4264
[9] Technical update: Least-squares temporal difference learning
Boyan, JA
MACHINE LEARNING, 2002, 49 (2-3) : 233 - 246
[10] Linear least-squares algorithms for temporal difference learning
Bradtke, SJ
Barto, AG
MACHINE LEARNING, 1996, 22 (1-3) : 33 - 57

← 1 2 →