Adaptive Kernel-Width Selection for Kernel-Based Least-Squares Policy Iteration Algorithm

被引:0
|
作者
Wu, Jun [1 ]
Xu, Xin [1 ]
Zuo, Lei [1 ]
Li, Zhaobin [1 ]
Wang, Jian [1 ]
机构
[1] Natl Univ Def Technol, Inst Automat, Changsha 410073, Hunan, Peoples R China
来源
ADVANCES IN NEURAL NETWORKS - ISNN 2011, PT II | 2011年 / 6676卷
关键词
reinforcement learning; sparsification; least-squares; gradient descent; kernel width;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Kernel-based Least-squares Policy Iteration (KLSPI) algorithm provides a general reinforcement learning solution for large-scale Markov decision problems. In KLSPI, the Radial Basis Function (RBF) kernel is usually used to approximate the optimal value-function with high precision. However, selecting a proper kernel-width for the RBF kernel function is very important for KLSPI to be adopted successfully. In previous research, the kernel-width was usually set manually or calculated according to the sample distribution in advance, which requires prior knowledge or model information. In this paper, an adaptive kernel-width selection method is proposed for the KLSPI algorithm. Firstly, a sparsification procedure with neighborhood analysis based on the l(2)-ball of radius e is adopted, which helps obtain a reduced kernel dictionary without presetting the kernel-width. Secondly, a gradient descent method based on the Bellman Residual Error (BRE) is proposed so as to find out a kernel-width minimizing the sum of the BRE. The experimental results show the proposed method can help KLSPI approximate the true value-function more accurately, and, finally, obtain a better control policy.
引用
收藏
页码:611 / 619
页数:9
相关论文
共 46 条
  • [21] Novel Feature Selection and Kernel-Based Value Approximation Method for Reinforcement Learning
    Jakab, Hunor Sandor
    Csato, Lehel
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2013, 2013, 8131 : 170 - 177
  • [22] Equivalent Load Identification Algorithm Based on Least-squares in Frequency Domain
    Wang, Cheng
    Gou, Jin
    APPLIED MECHANICS AND MATERIALS I, PTS 1-3, 2013, 275-277 : 2677 - 2680
  • [23] FURTHER RESULTS ON LEAST-SQUARES BASED ADAPTIVE MINIMUM-VARIANCE CONTROL
    GUO, L
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1994, 32 (01) : 187 - 212
  • [24] An explainable multi-sparsity multi-kernel nonconvex optimization least-squares classifier method via ADMM
    Zhang, Zhiwang
    He, Jing
    Cao, Jie
    Li, Shuqing
    Li, Xingsen
    Zhang, Kai
    Wang, Pingjiang
    Shi, Yong
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (18) : 16103 - 16128
  • [25] An explainable multi-sparsity multi-kernel nonconvex optimization least-squares classifier method via ADMM
    Zhiwang Zhang
    Jing He
    Jie Cao
    Shuqing Li
    Xingsen Li
    Kai Zhang
    Pingjiang Wang
    Yong Shi
    Neural Computing and Applications, 2022, 34 : 16103 - 16128
  • [26] Parallel algorithm for convection-diffusion system based on least-squares procedure
    Zhang, Jiansong
    Guo, Hui
    Fu, Hongfei
    Chang, Yanzhen
    SPRINGERPLUS, 2016, 5
  • [27] Robust least-squares scene matching algorithm based on generalized cost function
    Xu, BC
    Chen, Z
    PROCEEDINGS OF THE THIRD INTERNATIONAL SYMPOSIUM ON INSTRUMENTATION SCIENCE AND TECHNOLOGY, VOL 2, 2004, : 319 - 324
  • [28] A Low-Complexity Channel Estimation Based on a Least-Squares Algorithm in OFDM Systems
    Kao, Yung-An
    Wu, Kun-Feng
    APPLIED SCIENCES-BASEL, 2022, 12 (09):
  • [29] Adaptive Sparse Quantization Kernel Least Mean Square Algorithm for Online Prediction of Chaotic Time Series
    Chaochao Zhao
    Weijie Ren
    Min Han
    Circuits, Systems, and Signal Processing, 2021, 40 : 4346 - 4369
  • [30] Adaptive Sparse Quantization Kernel Least Mean Square Algorithm for Online Prediction of Chaotic Time Series
    Zhao, Chaochao
    Ren, Weijie
    Han, Min
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2021, 40 (09) : 4346 - 4369