Sample Complexity and Overparameterization Bounds for Temporal-Difference Learning With Neural Network Approximation

被引：2

作者：

Cayci, Semih ^{[1
,2
]}

Satpathi, Siddhartha ^{[3
,4
]}

He, Niao ^{[5
]}

Srikant, R. ^{[1
,6
]}

机构：

[1] Univ Illinois, Coordinated Sci Lab, Urbana, IL 61801 USA

[2] Rhein Westfal TH Aachen, Chair Math Informat Proc, D-52062 Aachen, Germany

[3] Univ Illinois, Urbana, IL 61801 USA

[4] Mayo Clin, Rochester, MN 55902 USA

[5] Swiss Fed Inst Technol, Dept Comp Sci, CH-8006 Zurich, Switzerland

[6] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL 61801 USA

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2023年 / 68卷 / 05期

基金：

瑞士国家科学基金会; 美国国家科学基金会;

关键词：

Neural networks; Approximation algorithms; Markov processes; Convergence; Complexity theory; Reinforcement learning; Kernel; reinforcement learning (RL); stochastic approximation; temporal-difference (TD) learning;

D O I：

10.1109/TAC.2023.3234234

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this article, we study the dynamics of temporal-difference (TD) learning with neural network-based value function approximation over a general state space, namely, neural TD learning. We consider two practically used algorithms, projection-free and max-norm regularized neural TD learning, and establish the first convergence bounds for these algorithms. An interesting observation from our results is that max-norm regularization can dramatically improve the performance of TD learning algorithms in terms of sample complexity and overparameterization. The results in this work rely on a Lyapunov drift analysis of the network parameters as a stopped and controlled random process.

引用

页码：2891 / 2905

页数：15

共 50 条

[1] An analysis of temporal-difference learning with function approximation
Tsitsiklis, JN
VanRoy, B
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1997, 42 (05) : 674 - 690
[2] An Analysis of Quantile Temporal-Difference Learning
Rowland, Mark
Munos, Remi
Azar, Mohammad Gheshlaghi
Tang, Yunhao
Ostrovski, Georg
Harutyunyan, Anna
Tuyls, Karl
Bellemare, Marc G.
Dabney, Will
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
[3] On the convergence of temporal-difference learning with linear function approximation
Tadic, V
MACHINE LEARNING, 2001, 42 (03) : 241 - 267
[4] On the Convergence of Temporal-Difference Learning with Linear Function Approximation
Vladislav Tadić
Machine Learning, 2001, 42 : 241 - 267
[5] Adaptive temporal-difference learning via deep neural network function approximation: a non-asymptotic analysis
Wang, Guoyong
Fu, Tiange
Zheng, Ruijuan
Zhao, Xuhui
Zhu, Junlong
Zhang, Mingchuan
COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (02)
[6] Temporal-difference learning with nonlinear function approximation: lazy training and mean field regimes
Agazzi, Andrea
Lu, Jianfeng
MATHEMATICAL AND SCIENTIFIC MACHINE LEARNING, VOL 145, 2021, 145 : 37 - 74
[7] Finite-Time Performance of Distributed Temporal-Difference Learning with Linear Function Approximation
Doan, Thinh T.
Maguluri, Siva Theja
Romberg, Justin
SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2021, 3 (01): : 298 - 320
[8] On Generalized Bellman Equations and Temporal-Difference Learning
Yu, Huizhen
Mahmood, A. Rupam
Sutton, Richard S.
JOURNAL OF MACHINE LEARNING RESEARCH, 2018, 19
[9] New Versions of Gradient Temporal-Difference Learning
Lee, Donghwan
Lim, Han-Dong
Park, Jihoon
Choi, Okyong
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (08) : 5006 - 5013
[10] Average cost temporal-difference learning
Tsitsiklis, JN
Van Roy, B
AUTOMATICA, 1999, 35 (11) : 1799 - 1808

← 1 2 3 4 5 →