Target transfer Q-learning and its convergence analysis

被引:0
作者
Wang Y. [1 ]
Liu Y. [1 ]
Chen W. [2 ]
Ma Z.-M. [3 ]
Liu T.-Y. [2 ]
机构
[1] School of Science, Beijing Jiaotong University
[2] Academy of Mathematics and Systems Science, Chinese Academy of Sciences
关键词
Convergence analysis; Q-learning; Reinforcement learning; Transfer learning;
D O I
10.1016/j.neucom.2020.02.117
中图分类号
学科分类号
摘要
Reinforcement Learning (RL) technologies are powerful to learn how to interact with environments and have been successfully applied to various important applications. Q-learning is one of the most popular methods in RL, which leverages the Bellman equation to update the Q-function. Considering that data collection in RL is both time and cost consuming and Q-learning converges slowly, different kinds of transfer RL algorithms are designed to improve the sample complexity of the new tasks1. However, most of the previous transfer RL algorithms are similar to the transfer learning methods in deep learning and are heuristic with no theoretical guarantee of the convergence rate. Therefore, it is important for us to clearly understand how and when will transfer learning help RL method and provide the theoretical guarantee for the improvement of the sample complexity. In this paper, we rethink the transfer Rl problems in the RL perspective and propose to transfer the Q-function learned in the old task to the target Q-function in the Q-learning of the new task. We call this new transfer Q-learning method target transfer Q-Learning (abbrev. TTQL). The transfer process is controlled by the error condition which can help to avoid the harm to the new tasks brought by the transferred target. We design the error condition in TTQL as whether the Bellman error of the transferred target Q-function is less than the current Q-function. We show that TTQL with the error condition will achieve a faster convergence rate than Q-learning. Our experiments are consistent with our theoretical results and verify the effectiveness of our proposed target transfer Q-learning method. © 2020 Elsevier B.V.
引用
收藏
页码:11 / 22
页数:11
相关论文
共 48 条
  • [1] Sutton R.S., Barto A.G., Et al., Reinforcement Learning: An Introduction, (1998)
  • [2] Kober J., Bagnell J.A., Peters J., Reinforcement learning in robotics: a survey, Int. J. Rob. Res., 32, 11, pp. 1238-1274, (2013)
  • [3] Mnih V., Kavukcuoglu K., Silver D., Rusu A.A., Veness J., Bellemare M.G., Graves A., Riedmiller M., Fidjeland A.K., Ostrovski G., Et al., Human-level control through deep reinforcement learning, Nature, 518, 7540, pp. 529-533, (2015)
  • [4] Silver D., Huang A., Maddison C.J., Guez A., Sifre L., Van Den Driessche G., Schrittwieser J., Antonoglou I., Panneershelvam V., Lanctot M., Et al., Mastering the game of go with deep neural networks and tree search, Nature, 529, 7587, pp. 484-489, (2016)
  • [5] Bahdanau D., Brakel P., Xu K., Goyal A., Lowe R., Pineau J., Courville A., Bengio Y., An actor-critic algorithm for sequence prediction, Proceedings of the International Conference on Learning Representations, (2016)
  • [6] Watkins C.J.C.H., Learning from delayed rewards, (1989)
  • [7] Jaakkola T., Jordan M.I., Singh S.P., Convergence of stochastic iterative dynamic programming algorithms, Proceedings of the Advances in Neural Information Processing Systems, pp. 703-710, (1994)
  • [8] Li B., Yang Q., Xue X., Transfer learning for collaborative filtering via a rating-matrix generative model, Proceedings of the 26th annual international conference on machine learning, pp. 617-624, (2009)
  • [9] Pan S.J., Yang Q., A survey on transfer learning, IEEE Trans. Knowl. Data Eng., 22, 10, pp. 1345-1359, (2010)
  • [10] Oquab M., Bottou L., Laptev I., Sivic J., Learning and transferring mid-level image representations using convolutional neural networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1717-1724, (2014)