Heuristically accelerated Q-learning algorithm based on Laplacian Eigenmap

被引:0
作者
Zhu, Mei-Qiang [1 ]
Li, Ming [1 ]
Cheng, Yu-Hu [1 ]
Zhang, Qian [1 ]
Wang, Xue-Song [1 ]
机构
[1] School of Information and Electrical Engineering, China University of Mining and Technology
来源
Kongzhi yu Juece/Control and Decision | 2014年 / 29卷 / 03期
关键词
Heuristic policy selection; Laplacian Eigenmap; Q-learning; Reinforcement learning;
D O I
10.13195/j.kzyjc.2012.1669
中图分类号
学科分类号
摘要
As a heuristic function, the Euclidean distance is usually used to select online action in reinforcement learning based on goal position. It is not applied to these tasks whose state spaces are not continuous in Euclidean space. For the problem, the Laplacian Eigenmap whose computational complexity is lower in manifold learning is introduced, then a method of heuristic policy selection based on the spectral graph theory is proposed. The proposed method is suitable for these tasks not only whose state spaces are continuous in some manifold that has a good estimation of intrinsic dimension, but also whose connection relation is expressed by an undirected graph. The simulation results of grid world show the effectiveness of the proposed method.
引用
收藏
页码:425 / 430
页数:5
相关论文
共 12 条
  • [1] Sutton R.S., Barto A.G., Reinforcement Learning: An Introduction, pp. 1-5, (1998)
  • [2] Gao Y., Chen S.F., Lu X., Research on reinforcement learning technology: A review, Acta Automatica Sinica, 30, 1, pp. 86-100, (2004)
  • [3] Wu J., Xu X., Wang J., Et al., Recent advances of reinforcement learning in multi-robot systems: A survey, Control and Decision, 26, 11, pp. 1601-1610, (2011)
  • [4] Chen Z.H., Yang Z.H., Wang H.B., Et al., Overview of reinforcement learning from knowledge expression and handling, Control and Decision, 23, 9, pp. 962-968, (2008)
  • [5] Zhu M.Q., Li M., Zhang Q., A dyna Q-learning algorithm in underground path planning, Industrial and Mine Automation, 12, pp. 71-75, (2012)
  • [6] Bianchi R.A.C., Ribeiro C.H.C., Costa A.H.R., Accelerating autonomous learning by using heuristic selection of actions, J of Heuristics, 14, 2, pp. 135-168, (2008)
  • [7] Marek G., Improving exploration in reinforcement learning through domain knowledge and parameter analysis, pp. 34-36, (2010)
  • [8] Bradley K.W., Peter S., Augmenting reinforcement learning with human feedback, The 28th ICML Workshop on New Developments in Imitation Learning, (2011)
  • [9] Belkin M., Niyogi P., Laplacian eigenmaps for dimensionality reduction and data representation, Neural Computation, 15, 6, pp. 1373-1396, (2003)
  • [10] Zhu M.Q., Cheng Y.H., Li M., Et al., A hybrid transfer algorithm for reinforcement learning based on spectral method, Acta Automatica Sinica, 38, 11, pp. 1765-1776, (2012)