Gaussian-kernel-based adaptive critic design using two-phase value iteration

被引：10

作者：

Chen, Xin ^{[1
,2
]}

Wang, Wei ^{[1
,2
]}

Cao, Weihua ^{[1
,2
]}

Wu, Min ^{[1
,2
]}

机构：

[1] China Univ Geosci, Sch Automat, Wuhan 430074, Hubei, Peoples R China

[2] Hubei Key Lab Adv Control & Intelligent Automat C, Wuhan 430079, Hubei, Peoples R China

来源：

INFORMATION SCIENCES | 2019年 / 482卷

基金：

中国国家自然科学基金;

关键词：

Adaptive critic design; Gaussian-kernel function; Two-phase iteration; Reinforcement learning; ONLINE LEARNING CONTROL; REINFORCEMENT;

D O I：

10.1016/j.ins.2018.12.019

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Adaptive critic design is an efficient way to learn optimal action policies on-line, in which a critic network plays an important role to estimate value functions. Because of its good generalization and easy configuration, kernel-based method is prevalently introduced to the construction of critic network. Conventionally the hyper-parameters of kernel-based model need to be predetermined, but empirical selection of them may mislead kernel based regression with an improper modeling hypothesis space. To tackle this problem, a two-phase iteration of value function approximation and hyper-parameters optimization for Gaussian-kernel based adaptive critic design (GK-ACD) is presented in this paper, which not only approximates the value functions, but also updates the hyper-parameters on-line. Since the two phases are strong coupling, the theoretical proof based on stochastic approximation derives the sufficient conditions guaranteeing the convergence, and points out that the algorithm's performance mostly relies on the design of coordinated learning rates w.r.t. the two phases. Finally a series of numerical experiments are given to discuss the necessity of two-phase updates and the performance under the coordinated learning rates. (C) 2018 Elsevier Inc. All rights reserved.

引用

页码：139 / 155

页数：17

共 38 条

[1]

Barreto AMS, 2016, J MACH LEARN RES, V17

[2] Two-Phase Iteration for Value Function Approximation and Hyperparameter Optimization in Gaussian-Kernel-Based Adaptive Critic Design [J].

Chen, Xin ;

Xie, Penghuan ;

Xiong, Yonghua ;

He, Yong ;

Wu, Min .

MATHEMATICAL PROBLEMS IN ENGINEERING, 2015, 2015

[3] Coordinated learning based on time-sharing tracking framework and Gaussian regression for continuous multi-agent systems [J].

Chen, Xin ;

Xie, Penghuan ;

He, Yong ;

Wu, Min .

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2015, 41 :56-64

[4] Online Selective Kernel-Based Temporal Difference Learning [J].

Chen, Xingguo ;

Gao, Yang ;

Wang, Ruili .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2013, 24 (12) :1944-1956

[5] Gaussian process dynamic programming [J].

Deisenroth, Marc Peter ;

Rasmussen, Carl Edward ;

Peters, Jan .

NEUROCOMPUTING, 2009, 72 (7-9) :1508-1524

[6]

Dietterich ThomasG., 2001, Advances in Neural Information Processing Systems, P1491

[7] The kernel recursive least-squares algorithm [J].

Engel, Y ;

Mannor, S ;

Meir, R .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2004, 52 (08) :2275-2285

[8]

Engel Y., 2003, INT C MACHINE LEARNI, P154

[9]

Engel Y., 2005, P 22 INT C MACH LEAR, P201, DOI DOI 10.1145/1102351.1102377

[10] Adaptive feedback control by constrained approximate dynamic programming [J].

Ferrari, Silvia ;

Steck, James E. ;

Chandramohan, Rajeev .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (04) :982-987

← 1 2 3 4 →