Gaussian-kernel-based adaptive critic design using two-phase value iteration

被引:10
作者
Chen, Xin [1 ,2 ]
Wang, Wei [1 ,2 ]
Cao, Weihua [1 ,2 ]
Wu, Min [1 ,2 ]
机构
[1] China Univ Geosci, Sch Automat, Wuhan 430074, Hubei, Peoples R China
[2] Hubei Key Lab Adv Control & Intelligent Automat C, Wuhan 430079, Hubei, Peoples R China
基金
中国国家自然科学基金;
关键词
Adaptive critic design; Gaussian-kernel function; Two-phase iteration; Reinforcement learning; ONLINE LEARNING CONTROL; REINFORCEMENT;
D O I
10.1016/j.ins.2018.12.019
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Adaptive critic design is an efficient way to learn optimal action policies on-line, in which a critic network plays an important role to estimate value functions. Because of its good generalization and easy configuration, kernel-based method is prevalently introduced to the construction of critic network. Conventionally the hyper-parameters of kernel-based model need to be predetermined, but empirical selection of them may mislead kernel based regression with an improper modeling hypothesis space. To tackle this problem, a two-phase iteration of value function approximation and hyper-parameters optimization for Gaussian-kernel based adaptive critic design (GK-ACD) is presented in this paper, which not only approximates the value functions, but also updates the hyper-parameters on-line. Since the two phases are strong coupling, the theoretical proof based on stochastic approximation derives the sufficient conditions guaranteeing the convergence, and points out that the algorithm's performance mostly relies on the design of coordinated learning rates w.r.t. the two phases. Finally a series of numerical experiments are given to discuss the necessity of two-phase updates and the performance under the coordinated learning rates. (C) 2018 Elsevier Inc. All rights reserved.
引用
收藏
页码:139 / 155
页数:17
相关论文
共 38 条
  • [1] Barreto AMS, 2016, J MACH LEARN RES, V17
  • [2] Two-Phase Iteration for Value Function Approximation and Hyperparameter Optimization in Gaussian-Kernel-Based Adaptive Critic Design
    Chen, Xin
    Xie, Penghuan
    Xiong, Yonghua
    He, Yong
    Wu, Min
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2015, 2015
  • [3] Coordinated learning based on time-sharing tracking framework and Gaussian regression for continuous multi-agent systems
    Chen, Xin
    Xie, Penghuan
    He, Yong
    Wu, Min
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2015, 41 : 56 - 64
  • [4] Online Selective Kernel-Based Temporal Difference Learning
    Chen, Xingguo
    Gao, Yang
    Wang, Ruili
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2013, 24 (12) : 1944 - 1956
  • [5] Gaussian process dynamic programming
    Deisenroth, Marc Peter
    Rasmussen, Carl Edward
    Peters, Jan
    [J]. NEUROCOMPUTING, 2009, 72 (7-9) : 1508 - 1524
  • [6] Dietterich ThomasG., 2001, Advances in Neural Information Processing Systems, P1491
  • [7] The kernel recursive least-squares algorithm
    Engel, Y
    Mannor, S
    Meir, R
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2004, 52 (08) : 2275 - 2285
  • [8] Engel Y., 2003, INT C MACHINE LEARNI, P154
  • [9] Engel Y., 2005, P 22 INT C MACH LEAR, P201, DOI DOI 10.1145/1102351.1102377
  • [10] Adaptive feedback control by constrained approximate dynamic programming
    Ferrari, Silvia
    Steck, James E.
    Chandramohan, Rajeev
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (04): : 982 - 987