Finite-Time Theory for Momentum Q-learning

被引:0
作者
Weng, Bowen [1 ]
Xiong, Huaqing [1 ]
Zhao, Lin [2 ]
Liang, Yingbin [1 ]
Zhang, Wei [3 ]
机构
[1] Ohio State Univ, Dept Elect & Comp Engn, Columbus, OH 43210 USA
[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore
[3] Southern Univ Sci & Technol, Dept Mech & Energy Engn, Shenzhen, Peoples R China
来源
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, VOL 161 | 2021年 / 161卷
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
STOCHASTIC-APPROXIMATION; CONVERGENCE; RATES;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing studies indicate that momentum ideas in conventional optimization can be used to improve the performance of Q-learning algorithms. However, the finite-time analysis for momentum-based Q-learning algorithms is only available for the tabular case without function approximation. This paper analyzes a class of momentum-based Q-learning algorithms with finite-time convergence guarantee. Specifically, we propose the Momentum-Q algorithm, which integrates the Nesterov's and Polyak's momentum schemes, and generalizes the existing momentum-based Q-learning algorithms. For the infinite state-action space case, we establish the convergence guarantee for Momentum-Q with linear function approximation under Markovian sampling. In particular, we characterize a finite-time convergence rate which is provably faster than the vanilla Q-learning. This is the first finite-time analysis for momentum-based Q-learning algorithms with function approximation. For the tabular case under synchronous sampling, we also obtain a finite-time convergence rate that is slightly better than the SpeedyQ [Azar et al., 2011]. Finally, we demonstrate through various experiments that the proposed MomentumQ outperforms other momentum-based Q-learning algorithms.
引用
收藏
页码:665 / 674
页数:10
相关论文
共 43 条
[1]  
[Anonymous], 2011, Proc. Adv. Neural Inf. Process. Syst.
[2]  
Baird L., 1995, Machine Learning. Proceedings of the Twelfth International Conference on Machine Learning, P30
[3]  
Bertsekas D. P., 1996, Neuro-dynamic Programming
[4]  
Bhandari Jalaj, 2018, C LEARNING THEORY CO
[5]   The ODE method for convergence of stochastic approximation and reinforcement learning [J].
Borkar, VS ;
Meyn, SP .
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2000, 38 (02) :447-469
[6]  
CAI Z., 2019, Advances in Neural Information Processing Systems, V32, P11312
[7]  
Castillo GA, 2019, IEEE INT CONF ROBOT, P284, DOI [10.1109/ICRA.2019.8793627, 10.1109/icra.2019.8793627]
[8]  
Chen Z., 2019, PREPRINT
[9]  
Devraj AM, 2019, ANN ALLERTON CONF, P749, DOI [10.1109/allerton.2019.8919828, 10.1109/ALLERTON.2019.8919828]
[10]  
Devraj AM, 2017, ADV NEUR IN, V30