A hybrid transfer algorithm for reinforcement learning based on spectral method

被引:0
作者
机构
[1] School of Information and Electrical Engineering, China University of Mining and Technology
来源
Zhu, M.-Q. (zhumeiqiang@cumt.edu.cn) | 1765年 / Science Press卷 / 38期
关键词
Hierarchical decomposition; Proto-value functions; Reinforcement learning; Spectral graph theory; Transfer learning;
D O I
10.3724/SP.J.1004.2012.01765
中图分类号
学科分类号
摘要
For scaling up state space transfer underlying the proto-value function framework, only some basis functions corresponding to smaller eigenvalues are transferred effectively, which will result in wrong approximation of value function in the target task. In order to solve the problem, according to the fact that Laplacian eigenmap can preserve the local topology structure of state space, an improved hierarchical decomposition algorithm based on the spectral graph theory is proposed and a hybrid transfer method integrating basis function transfer with subtask optimal polices transfer is designed. At first, the basis functions of the source task are constructed using spectral method. The basis functions of target task are produced through linearly interpolating basis functions of the source task. Secondly, the produced second basis function of the target task (approximating Fiedler eigenvector) is used to decompose the target task. Then the optimal polices of subtasks are obtained using the improved hierarchical decomposition algorithm. At last, the obtained basis functions and optimal subtask polices are transferred to the target task. The proposed hybrid transfer method can directly get optimal policies of some states, reduce the number of iterations and the minimum number of basis functions needed to approximate the value function. The method is suitable for scaling up state space transfer task with hierarchical control structure. Simulation results of grid world have verified the validity of the proposed hybrid transfer method. © 2012 Acta Automatica Sinica.
引用
收藏
页码:1765 / 1776
页数:11
相关论文
共 18 条
  • [1] Sutton R.S., Barto A.G., Reinforcement Learning: An Introduction, (1998)
  • [2] Gao Y., Chen S.-F., Lu X., Research on reinforcement learning technology: A review, Acta Automatica Sinica, 30, 1, pp. 86-100, (2004)
  • [3] Zhao D.-B., Liu D.-R., Yi J.-Q., An overview on the adaptive dynamic programming based urban city traffic signal optimal control, Acta Automatica Sinica, 35, 6, pp. 676-681, (2009)
  • [4] Barto A.G., Mahadevan S., Recent advances in hierarchical reinforcement learning, Discrete Event Dynamic Systems, 13, 4, pp. 341-379, (2003)
  • [5] Pan S.J., Yang Q., A survey on transfer learning, IEEE Transactions on Knowledge and Data Engineering, 22, 10, pp. 1345-1359, (2010)
  • [6] Taylor M.E., Stone P., Transfer learning for reinforcement learning domains: A survey, The Journal of Machine Learning Research, 10, pp. 1633-1685, (2009)
  • [7] Wang H., Gao Y., Cheng X.-G., Transfer of reinforcement learning: The state of the art, Acta Electronica Sinica, 36, 12 a, pp. 39-43, (2008)
  • [8] Mahadevan S., Maggioni M., Proto-value functions: A Lapla-cian framework for learning representation and control in Markov decision processes, The Journal of Machine Learning Research, 8, pp. 2169-2231, (2007)
  • [9] Chiu C.C., Soo V.W., Automatic complexity reduction in reinforcement learning, Computational Intelligence, 26, 1, pp. 1-25, (2010)
  • [10] Simsek O., Wolfe A.P., Barto A.G., Identifying useful sub-goals in reinforcement learning by local graph partitioning, Proceedings of the 22nd International Conference on Machine Learning, pp. 816-823, (2005)