Adaptive Kernel-Width Selection for Kernel-Based Least-Squares Policy Iteration Algorithm

被引:0
作者
Wu, Jun [1 ]
Xu, Xin [1 ]
Zuo, Lei [1 ]
Li, Zhaobin [1 ]
Wang, Jian [1 ]
机构
[1] Natl Univ Def Technol, Inst Automat, Changsha 410073, Hunan, Peoples R China
来源
ADVANCES IN NEURAL NETWORKS - ISNN 2011, PT II | 2011年 / 6676卷
关键词
reinforcement learning; sparsification; least-squares; gradient descent; kernel width;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Kernel-based Least-squares Policy Iteration (KLSPI) algorithm provides a general reinforcement learning solution for large-scale Markov decision problems. In KLSPI, the Radial Basis Function (RBF) kernel is usually used to approximate the optimal value-function with high precision. However, selecting a proper kernel-width for the RBF kernel function is very important for KLSPI to be adopted successfully. In previous research, the kernel-width was usually set manually or calculated according to the sample distribution in advance, which requires prior knowledge or model information. In this paper, an adaptive kernel-width selection method is proposed for the KLSPI algorithm. Firstly, a sparsification procedure with neighborhood analysis based on the l(2)-ball of radius e is adopted, which helps obtain a reduced kernel dictionary without presetting the kernel-width. Secondly, a gradient descent method based on the Bellman Residual Error (BRE) is proposed so as to find out a kernel-width minimizing the sum of the BRE. The experimental results show the proposed method can help KLSPI approximate the true value-function more accurately, and, finally, obtain a better control policy.
引用
收藏
页码:611 / 619
页数:9
相关论文
共 46 条
  • [31] A hybrid least squares-clonal selection based algorithm for harmonics estimation
    Moravej, Zahra
    Enayati, Javad
    INTERNATIONAL TRANSACTIONS ON ELECTRICAL ENERGY SYSTEMS, 2014, 24 (01): : 1 - 15
  • [32] A Nonlinear Adaptive Beamforming Algorithm Based on Least Squares Support Vector Regression
    Wang, Lutao
    Jin, Gang
    Li, Zhengzhou
    Xu, Hongbin
    SENSORS, 2012, 12 (09) : 12424 - 12436
  • [33] A robust phase unwrapping algorithm based on reliability mask and weighted minimum least-squares method
    Yan, Liping
    Zhang, Haiyan
    Zhang, Rui
    Xie, Xin
    Chen, Benyong
    OPTICS AND LASERS IN ENGINEERING, 2019, 112 : 39 - 45
  • [34] Gaussian-kernel-based adaptive critic design using two-phase value iteration
    Chen, Xin
    Wang, Wei
    Cao, Weihua
    Wu, Min
    INFORMATION SCIENCES, 2019, 482 : 139 - 155
  • [35] Model-based policy gradients with parameter-based exploration by least-squares conditional density estimation
    Tangkaratt, Voot
    Mod, Syogo
    Zhao, Tingting
    Morimoto, Jun
    Sugiyama, Masashi
    NEURAL NETWORKS, 2014, 57 : 128 - 140
  • [36] Robust adaptive generalized correntropy-based smoothed graph signal recovery with a kernel width learning
    Torkamani, Razieh
    Zayyani, Hadi
    Korki, Mehdi
    Marvasti, Farokh
    SIGNAL IMAGE AND VIDEO PROCESSING, 2025, 19 (01)
  • [37] Bias compensation based recursive least-squares identification algorithm for MISO system with input and output noises
    Wu, Ai-Guo
    Qi, Wen-Nian
    Dong, Rui-Qi
    2017 11TH ASIAN CONTROL CONFERENCE (ASCC), 2017, : 812 - 816
  • [39] An Adaptive Policy Evaluation Network Based on Recursive Least Squares Temporal Difference With Gradient Correction
    Li, Dazi
    Wang, Yuting
    Song, Tianheng
    Jin, Qibing
    IEEE ACCESS, 2018, 6 : 7515 - 7525
  • [40] Slip Ratio Optimization in Vehicle Safety Control Systems Using Least-Squares Based Adaptive Extremum Seeking
    Zengin, Nursefa
    Zengin, Halit
    Fidan, Baris
    Khajepour, Amir
    2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 1445 - 1450