Heuristically accelerated Q-learning algorithm based on Laplacian Eigenmap

被引：0

作者：

Zhu, Mei-Qiang ^{[1
]}

Li, Ming ^{[1
]}

Cheng, Yu-Hu ^{[1
]}

Zhang, Qian ^{[1
]}

Wang, Xue-Song ^{[1
]}

机构：

[1] School of Information and Electrical Engineering, China University of Mining and Technology

来源：

Kongzhi yu Juece/Control and Decision | 2014年 / 29卷 / 03期

关键词：

Heuristic policy selection; Laplacian Eigenmap; Q-learning; Reinforcement learning;

D O I：

10.13195/j.kzyjc.2012.1669

中图分类号：

学科分类号：

摘要：

As a heuristic function, the Euclidean distance is usually used to select online action in reinforcement learning based on goal position. It is not applied to these tasks whose state spaces are not continuous in Euclidean space. For the problem, the Laplacian Eigenmap whose computational complexity is lower in manifold learning is introduced, then a method of heuristic policy selection based on the spectral graph theory is proposed. The proposed method is suitable for these tasks not only whose state spaces are continuous in some manifold that has a good estimation of intrinsic dimension, but also whose connection relation is expressed by an undirected graph. The simulation results of grid world show the effectiveness of the proposed method.

引用

页码：425 / 430

页数：5

共 12 条

[1] Sutton R.S., Barto A.G., Reinforcement Learning: An Introduction, pp. 1-5, (1998)
[2] Gao Y., Chen S.F., Lu X., Research on reinforcement learning technology: A review, Acta Automatica Sinica, 30, 1, pp. 86-100, (2004)
[3] Wu J., Xu X., Wang J., Et al., Recent advances of reinforcement learning in multi-robot systems: A survey, Control and Decision, 26, 11, pp. 1601-1610, (2011)
[4] Chen Z.H., Yang Z.H., Wang H.B., Et al., Overview of reinforcement learning from knowledge expression and handling, Control and Decision, 23, 9, pp. 962-968, (2008)
[5] Zhu M.Q., Li M., Zhang Q., A dyna Q-learning algorithm in underground path planning, Industrial and Mine Automation, 12, pp. 71-75, (2012)
[6] Bianchi R.A.C., Ribeiro C.H.C., Costa A.H.R., Accelerating autonomous learning by using heuristic selection of actions, J of Heuristics, 14, 2, pp. 135-168, (2008)
[7] Marek G., Improving exploration in reinforcement learning through domain knowledge and parameter analysis, pp. 34-36, (2010)
[8] Bradley K.W., Peter S., Augmenting reinforcement learning with human feedback, The 28th ICML Workshop on New Developments in Imitation Learning, (2011)
[9] Belkin M., Niyogi P., Laplacian eigenmaps for dimensionality reduction and data representation, Neural Computation, 15, 6, pp. 1373-1396, (2003)
[10] Zhu M.Q., Cheng Y.H., Li M., Et al., A hybrid transfer algorithm for reinforcement learning based on spectral method, Acta Automatica Sinica, 38, 11, pp. 1765-1776, (2012)

← 1 2 →