An Adaptive Policy Evaluation Network Based on Recursive Least Squares Temporal Difference With Gradient Correction

被引：3

作者：

Li, Dazi ^{[1
]}

Wang, Yuting ^{[1
]}

Song, Tianheng ^{[1
]}

Jin, Qibing ^{[1
]}

机构：

[1] Beijing Univ Chem Technol, Coll Informat Sci & Technol, Beijing 100029, Peoples R China

来源：

IEEE ACCESS | 2018年 / 6卷

基金：

中国国家自然科学基金; 北京市自然科学基金;

关键词：

Policy evaluation; reinforcement learning; recursive least squares temporal difference with gradient correction; value function approximation;

D O I：

10.1109/ACCESS.2018.2805298

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Reinforcement learning (RL) is an important machine learning paradigm that can be used for learning from the data obtained by the human-computer interface and the interaction in human-centered smart systems. One of the essential problems in RL algorithms is the value functions. Value functions are usually estimated via linearly parameterized value functions. Prior RL algorithms that generalize in this way required learning times tuning the linear weights leaving out the basis function. In fact, basis functions in value function approximation also have a significant influence on the performance. In this paper, a new adaptive policy evaluation network based on recursive least squares temporal difference (TD) with gradient correction (adaptive RC network) is proposed. Basis functions in the proposed algorithm were adaptive optimized, mainly aiming at the widths. In the proposed algorithm, TD error and value function were estimated by RC algorithm and value function approximation. The gradient derived from the squares of TD error was used to update the widths of basis functions. Therefore, the RC network can adjust its network parameters in an adaptive way with a self-organizing approach according to the progress in learning. Empirical results based on the three RL benchmarks show the performance and applicability of the proposed adaptive RC network.

引用

页码：7515 / 7525

页数：11

共 31 条

[1] [Anonymous], 2003, J. Mach. Learn. Res.
[2] [Anonymous], 2015, Reinforcement Learning: An Introduction
[3] [Anonymous], 2010, Algorithms for Reinforcement Learning
[4] [Anonymous], 1995, Dynamic Programming and Optimal Control
[5] Robust high performance reinforcement learning through weighted k-nearest neighbors
Antonio Martin H, Jose
de Lope, Javier
Maravall, Dario
[J]. NEUROCOMPUTING, 2011, 74 (08) : 1251 - 1259
[6] Barto A.G., 1988, NEURONLIKE ADAPTIVE, P834
[7] Approximate policy iteration: A survey and some new methods
Bertsekas D.P.
[J]. Journal of Control Theory and Applications, 2011, 9 (3): : 310 - 335
[8] Bradtke SJ, 1996, MACH LEARN, V22, P33, DOI 10.1007/BF00114723
[9] Busoniu L, 2010, AUTOM CONTROL ENG SE, P1, DOI 10.1201/9781439821091-f
[10] Dann C, 2014, J MACH LEARN RES, V15, P809

← 1 2 3 4 →