Adaptive Kernel-Width Selection for Kernel-Based Least-Squares Policy Iteration Algorithm

被引:0
|
作者
Wu, Jun [1 ]
Xu, Xin [1 ]
Zuo, Lei [1 ]
Li, Zhaobin [1 ]
Wang, Jian [1 ]
机构
[1] Natl Univ Def Technol, Inst Automat, Changsha 410073, Hunan, Peoples R China
来源
ADVANCES IN NEURAL NETWORKS - ISNN 2011, PT II | 2011年 / 6676卷
关键词
reinforcement learning; sparsification; least-squares; gradient descent; kernel width;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Kernel-based Least-squares Policy Iteration (KLSPI) algorithm provides a general reinforcement learning solution for large-scale Markov decision problems. In KLSPI, the Radial Basis Function (RBF) kernel is usually used to approximate the optimal value-function with high precision. However, selecting a proper kernel-width for the RBF kernel function is very important for KLSPI to be adopted successfully. In previous research, the kernel-width was usually set manually or calculated according to the sample distribution in advance, which requires prior knowledge or model information. In this paper, an adaptive kernel-width selection method is proposed for the KLSPI algorithm. Firstly, a sparsification procedure with neighborhood analysis based on the l(2)-ball of radius e is adopted, which helps obtain a reduced kernel dictionary without presetting the kernel-width. Secondly, a gradient descent method based on the Bellman Residual Error (BRE) is proposed so as to find out a kernel-width minimizing the sum of the BRE. The experimental results show the proposed method can help KLSPI approximate the true value-function more accurately, and, finally, obtain a better control policy.
引用
收藏
页码:611 / 619
页数:9
相关论文
共 46 条
  • [1] Kernel-based least squares policy iteration for reinforcement learning
    Xu, Xin
    Hu, Dewen
    Lu, Xicheng
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2007, 18 (04): : 973 - 992
  • [2] Self-Learning Cruise Control Using Kernel-Based Least Squares Policy Iteration
    Wang, Jian
    Xu, Xin
    Liu, Daxue
    Sun, Zhenping
    Chen, Qingyang
    IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2014, 22 (03) : 1078 - 1087
  • [3] Least-squares policy iteration
    Lagoudakis, MG
    Parr, R
    JOURNAL OF MACHINE LEARNING RESEARCH, 2004, 4 (06) : 1107 - 1149
  • [4] Experience replay for least-squares policy iteration
    Liu, Quan (quanliu@suda.edu.cn), 1600, Institute of Electrical and Electronics Engineers Inc. (01): : 274 - 281
  • [5] Sparse Kernel-Based Least Squares Temporal Difference with Prioritized Sweeping
    Sun, Cijia
    Ling, Xinghong
    Fu, Yuchen
    Liu, Quan
    Zhu, Haijun
    Zhai, Jianwei
    Zhang, Peng
    NEURAL INFORMATION PROCESSING, ICONIP 2016, PT III, 2016, 9949 : 221 - 230
  • [6] Quantized Kernel Recursive Least Squares Algorithm
    Chen, Badong
    Zhao, Songlin
    Zhu, Pingping
    Principe, Jose C.
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2013, 24 (09) : 1484 - 1491
  • [7] Least-squares policy iteration algorithms for robotics: Online, continuous, and automatic
    Friedrich, Stefan R.
    Schreibauer, Michael
    Buss, Martin
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2019, 83 : 72 - 84
  • [8] Finite-Sample Analysis of Least-Squares Policy Iteration
    Lazaric, Alessandro
    Ghavamzadeh, Mohammad
    Munos, Remi
    JOURNAL OF MACHINE LEARNING RESEARCH, 2012, 13 : 3041 - 3074
  • [9] Adaptive Robust Least-Squares Smoothing Algorithm
    Lin, Xu
    Liang, Xiong
    Li, Wei
    Chen, Changxin
    Cheng, Lin
    Wang, Hongyue
    Zhang, Qingqing
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71
  • [10] When cannot regularization improve the least squares estimate in the kernel-based regularized system identification
    Mu, Biqiang
    Ljung, Lennart
    Chen, Tianshi
    AUTOMATICA, 2024, 160