Target transfer Q-learning and its convergence analysis

被引：0

作者：

Wang Y. ^{[1
]}

Liu Y. ^{[1
]}

Chen W. ^{[2
]}

Ma Z.-M. ^{[3
]}

Liu T.-Y. ^{[2
]}

机构：

[1] School of Science, Beijing Jiaotong University

[2] Academy of Mathematics and Systems Science, Chinese Academy of Sciences

来源：

Neurocomputing | 2020年 / 392卷

关键词：

Convergence analysis; Q-learning; Reinforcement learning; Transfer learning;

D O I：

10.1016/j.neucom.2020.02.117

中图分类号：

学科分类号：

摘要：

Reinforcement Learning (RL) technologies are powerful to learn how to interact with environments and have been successfully applied to various important applications. Q-learning is one of the most popular methods in RL, which leverages the Bellman equation to update the Q-function. Considering that data collection in RL is both time and cost consuming and Q-learning converges slowly, different kinds of transfer RL algorithms are designed to improve the sample complexity of the new tasks1. However, most of the previous transfer RL algorithms are similar to the transfer learning methods in deep learning and are heuristic with no theoretical guarantee of the convergence rate. Therefore, it is important for us to clearly understand how and when will transfer learning help RL method and provide the theoretical guarantee for the improvement of the sample complexity. In this paper, we rethink the transfer Rl problems in the RL perspective and propose to transfer the Q-function learned in the old task to the target Q-function in the Q-learning of the new task. We call this new transfer Q-learning method target transfer Q-Learning (abbrev. TTQL). The transfer process is controlled by the error condition which can help to avoid the harm to the new tasks brought by the transferred target. We design the error condition in TTQL as whether the Bellman error of the transferred target Q-function is less than the current Q-function. We show that TTQL with the error condition will achieve a faster convergence rate than Q-learning. Our experiments are consistent with our theoretical results and verify the effectiveness of our proposed target transfer Q-learning method. © 2020 Elsevier B.V.

引用

页码：11 / 22

页数：11

共 48 条

[41] Haarnoja T., Tang H., Abbeel P., Levine S., Reinforcement learning with deep energy-based policies, Proceedings of the International Conference on Machine Learning, pp. 1352-1361, (2017)
[42] Szepesvari C., The asymptotic convergence-rate of q-learning, Proceedings of the Advances in Neural Information Processing Systems, pp. 1064-1070, (1998)
[43] Ma C., Wen J., Bengio Y., (2018)
[44] Barto A.G., Sutton R.S., Anderson C.W., Neuronlike adaptive elements that can solve difficult learning control problems, IEEE Trans. Syst. Man Cybern., SMC-13, 5, pp. 834-846, (1983)
[45] Dhariwal P., Hesse C., Klimov O., Nichol A., Plappert M., Radford A., Schulman J., Sidor S., Wu Y., Zhokhov P., (2017)
[46] Castro P.S., Moitra S., Gelada C., Kumar S., Bellemare M.G., (2018)
[47] Kornblith S., Shlens J., Le Q.V., Do better imagenet models transfer better?, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2661-2671, (2019)
[48] Devlin J., Chang M.-W., Lee K., Toutanova K., (2018)

← 1 2 3 4 5 →