Adaptive Kernel-Width Selection for Kernel-Based Least-Squares Policy Iteration Algorithm

被引：0

作者：

Wu, Jun ^{[1
]}

Xu, Xin ^{[1
]}

Zuo, Lei ^{[1
]}

Li, Zhaobin ^{[1
]}

Wang, Jian ^{[1
]}

机构：

[1] Natl Univ Def Technol, Inst Automat, Changsha 410073, Hunan, Peoples R China

来源：

ADVANCES IN NEURAL NETWORKS - ISNN 2011, PT II | 2011年 / 6676卷

关键词：

reinforcement learning; sparsification; least-squares; gradient descent; kernel width;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The Kernel-based Least-squares Policy Iteration (KLSPI) algorithm provides a general reinforcement learning solution for large-scale Markov decision problems. In KLSPI, the Radial Basis Function (RBF) kernel is usually used to approximate the optimal value-function with high precision. However, selecting a proper kernel-width for the RBF kernel function is very important for KLSPI to be adopted successfully. In previous research, the kernel-width was usually set manually or calculated according to the sample distribution in advance, which requires prior knowledge or model information. In this paper, an adaptive kernel-width selection method is proposed for the KLSPI algorithm. Firstly, a sparsification procedure with neighborhood analysis based on the l(2)-ball of radius e is adopted, which helps obtain a reduced kernel dictionary without presetting the kernel-width. Secondly, a gradient descent method based on the Bellman Residual Error (BRE) is proposed so as to find out a kernel-width minimizing the sum of the BRE. The experimental results show the proposed method can help KLSPI approximate the true value-function more accurately, and, finally, obtain a better control policy.

引用

页码：611 / 619

页数：9

共 46 条

[1] Kernel-based least squares policy iteration for reinforcement learning
Xu, Xin
Hu, Dewen
Lu, Xicheng
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2007, 18 (04): : 973 - 992
[2] Self-Learning Cruise Control Using Kernel-Based Least Squares Policy Iteration
Wang, Jian
Xu, Xin
Liu, Daxue
Sun, Zhenping
Chen, Qingyang
IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2014, 22 (03) : 1078 - 1087
[3] Least-squares policy iteration
Lagoudakis, MG
Parr, R
JOURNAL OF MACHINE LEARNING RESEARCH, 2004, 4 (06) : 1107 - 1149
[4] Experience replay for least-squares policy iteration
Liu, Quan (quanliu@suda.edu.cn), 1600, Institute of Electrical and Electronics Engineers Inc. (01): : 274 - 281
[5] Sparse Kernel-Based Least Squares Temporal Difference with Prioritized Sweeping
Sun, Cijia
Ling, Xinghong
Fu, Yuchen
Liu, Quan
Zhu, Haijun
Zhai, Jianwei
Zhang, Peng
NEURAL INFORMATION PROCESSING, ICONIP 2016, PT III, 2016, 9949 : 221 - 230
[6] Quantized Kernel Recursive Least Squares Algorithm
Chen, Badong
Zhao, Songlin
Zhu, Pingping
Principe, Jose C.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2013, 24 (09) : 1484 - 1491
[7] Least-squares policy iteration algorithms for robotics: Online, continuous, and automatic
Friedrich, Stefan R.
Schreibauer, Michael
Buss, Martin
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2019, 83 : 72 - 84
[8] Finite-Sample Analysis of Least-Squares Policy Iteration
Lazaric, Alessandro
Ghavamzadeh, Mohammad
Munos, Remi
JOURNAL OF MACHINE LEARNING RESEARCH, 2012, 13 : 3041 - 3074
[9] Adaptive Robust Least-Squares Smoothing Algorithm
Lin, Xu
Liang, Xiong
Li, Wei
Chen, Changxin
Cheng, Lin
Wang, Hongyue
Zhang, Qingqing
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71
[10] When cannot regularization improve the least squares estimate in the kernel-based regularized system identification
Mu, Biqiang
Ljung, Lennart
Chen, Tianshi
AUTOMATICA, 2024, 160

← 1 2 3 4 5 →