Neural Temporal-Difference Learning Converges to Global Optima

被引：0

作者：

Cai, Qi ^{[1
]}

Yang, Zhuoran ^{[2
]}

Lee, Jason D. ^{[3
]}

Wang, Zhaoran ^{[1
]}

机构：

[1] Northwestern Univ, Dept Ind Engn & Management Sci, Evanston, IL 60208 USA

[2] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08544 USA

[3] Princeton Univ, Dept Elect Engn, Princeton, NJ 08544 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019) | 2019年 / 32卷

关键词：

ALGORITHMS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Temporal-difference learning (TD), coupled with neural networks, is among the most fundamental building blocks of deep reinforcement learning. However, due to the nonlinearity in value function approximation, such a coupling leads to non-convexity and even divergence in optimization. As a result, the global convergence of neural TD remains unclear. In this paper, we prove for the first time that neural TD converges at a sublinear rate to the global optimum of the mean-squared projected Bellman error for policy evaluation. In particular, we show how such global convergence is enabled by the overparametrization of neural networks, which also plays a vital role in the empirical success of neural TD.

引用

页数：12

共 50 条

[1] Neural Temporal Difference and Q Learning Provably Converge to Global Optima
Cai, Qi
Yang, Zhuoran
Lee, Jason D.
Wang, Zhaoran
MATHEMATICS OF OPERATIONS RESEARCH, 2024, 49 (01) : 619 - 651
[2] Temporal-difference learning and applications in finance
Van Roy, B
COMPUTATIONAL FINANCE 1999, 2000, : 447 - 461
[3] Average cost temporal-difference learning
Tsitsiklis, JN
Van Roy, B
PROCEEDINGS OF THE 36TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-5, 1997, : 498 - 502
[4] True Online Temporal-Difference Learning
van Seijen, Harm
Mahmood, A. Rupam
Pilarski, Patrick M.
Machado, Marlos C.
Sutton, Richard S.
JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
[5] Average cost temporal-difference learning
Tsitsiklis, JN
Van Roy, B
AUTOMATICA, 1999, 35 (11) : 1799 - 1808
[6] Average cost temporal-difference learning
Lab. for Info. and Decision Systems, Massachusetts Inst. of Technology, Room 35-209, 77 Massachusetts Avenue, Cambridge, MA 02139-4307, United States
Automatica, 11 (1799-1808):
[7] An Analysis of Quantile Temporal-Difference Learning
Rowland, Mark
Munos, Remi
Azar, Mohammad Gheshlaghi
Tang, Yunhao
Ostrovski, Georg
Harutyunyan, Anna
Tuyls, Karl
Bellemare, Marc G.
Dabney, Will
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
[8] The serial blocking effect: a testbed for the neural mechanisms of temporal-difference learning
Ashraf Mahmud
Petio Petrov
Guillem R. Esber
Mihaela D. Iordanova
Scientific Reports, 9
[9] The serial blocking effect: a testbed for the neural mechanisms of temporal-difference learning
Mahmud, Ashraf
Petrov, Petio
Esber, Guillem R.
Iordanova, Mihaela D.
SCIENTIFIC REPORTS, 2019, 9 (1)
[10] Temporal-Difference Learning for Online Reachability Analysis
Akametalu, Anayo K.
Tomlin, Claire J.
2015 EUROPEAN CONTROL CONFERENCE (ECC), 2015, : 2508 - 2513

← 1 2 3 4 5 →