Finite-Time Theory for Momentum Q-learning

被引：0

作者：

Weng, Bowen ^{[1
]}

Xiong, Huaqing ^{[1
]}

Zhao, Lin ^{[2
]}

Liang, Yingbin ^{[1
]}

Zhang, Wei ^{[3
]}

机构：

[1] Ohio State Univ, Dept Elect & Comp Engn, Columbus, OH 43210 USA

[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore

[3] Southern Univ Sci & Technol, Dept Mech & Energy Engn, Shenzhen, Peoples R China

来源：

UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, VOL 161 | 2021年 / 161卷

基金：

美国国家科学基金会; 中国国家自然科学基金;

关键词：

STOCHASTIC-APPROXIMATION; CONVERGENCE; RATES;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing studies indicate that momentum ideas in conventional optimization can be used to improve the performance of Q-learning algorithms. However, the finite-time analysis for momentum-based Q-learning algorithms is only available for the tabular case without function approximation. This paper analyzes a class of momentum-based Q-learning algorithms with finite-time convergence guarantee. Specifically, we propose the Momentum-Q algorithm, which integrates the Nesterov's and Polyak's momentum schemes, and generalizes the existing momentum-based Q-learning algorithms. For the infinite state-action space case, we establish the convergence guarantee for Momentum-Q with linear function approximation under Markovian sampling. In particular, we characterize a finite-time convergence rate which is provably faster than the vanilla Q-learning. This is the first finite-time analysis for momentum-based Q-learning algorithms with function approximation. For the tabular case under synchronous sampling, we also obtain a finite-time convergence rate that is slightly better than the SpeedyQ [Azar et al., 2011]. Finally, we demonstrate through various experiments that the proposed MomentumQ outperforms other momentum-based Q-learning algorithms.

引用

页码：665 / 674

页数：10

共 43 条

[1]

[Anonymous], 2011, Proc. Adv. Neural Inf. Process. Syst.

[2]

Baird L., 1995, Machine Learning. Proceedings of the Twelfth International Conference on Machine Learning, P30

[3]

Bertsekas D. P., 1996, Neuro-dynamic Programming

[4]

Bhandari Jalaj, 2018, C LEARNING THEORY CO

[5] The ODE method for convergence of stochastic approximation and reinforcement learning [J].

Borkar, VS ;

Meyn, SP .

SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2000, 38 (02) :447-469

[6]

CAI Z., 2019, Advances in Neural Information Processing Systems, V32, P11312

[7]

Castillo GA, 2019, IEEE INT CONF ROBOT, P284, DOI [10.1109/ICRA.2019.8793627, 10.1109/icra.2019.8793627]

[8]

Chen Z., 2019, PREPRINT

[9]

Devraj AM, 2019, ANN ALLERTON CONF, P749, DOI [10.1109/allerton.2019.8919828, 10.1109/ALLERTON.2019.8919828]

[10]

Devraj AM, 2017, ADV NEUR IN, V30

← 1 2 3 4 5 →