Sample Complexity and Overparameterization Bounds for Temporal-Difference Learning With Neural Network Approximation

被引:2
|
作者
Cayci, Semih [1 ,2 ]
Satpathi, Siddhartha [3 ,4 ]
He, Niao [5 ]
Srikant, R. [1 ,6 ]
机构
[1] Univ Illinois, Coordinated Sci Lab, Urbana, IL 61801 USA
[2] Rhein Westfal TH Aachen, Chair Math Informat Proc, D-52062 Aachen, Germany
[3] Univ Illinois, Urbana, IL 61801 USA
[4] Mayo Clin, Rochester, MN 55902 USA
[5] Swiss Fed Inst Technol, Dept Comp Sci, CH-8006 Zurich, Switzerland
[6] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL 61801 USA
基金
瑞士国家科学基金会; 美国国家科学基金会;
关键词
Neural networks; Approximation algorithms; Markov processes; Convergence; Complexity theory; Reinforcement learning; Kernel; reinforcement learning (RL); stochastic approximation; temporal-difference (TD) learning;
D O I
10.1109/TAC.2023.3234234
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we study the dynamics of temporal-difference (TD) learning with neural network-based value function approximation over a general state space, namely, neural TD learning. We consider two practically used algorithms, projection-free and max-norm regularized neural TD learning, and establish the first convergence bounds for these algorithms. An interesting observation from our results is that max-norm regularization can dramatically improve the performance of TD learning algorithms in terms of sample complexity and overparameterization. The results in this work rely on a Lyapunov drift analysis of the network parameters as a stopped and controlled random process.
引用
收藏
页码:2891 / 2905
页数:15
相关论文
共 50 条
  • [1] An analysis of temporal-difference learning with function approximation
    Tsitsiklis, JN
    VanRoy, B
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1997, 42 (05) : 674 - 690
  • [2] An Analysis of Quantile Temporal-Difference Learning
    Rowland, Mark
    Munos, Remi
    Azar, Mohammad Gheshlaghi
    Tang, Yunhao
    Ostrovski, Georg
    Harutyunyan, Anna
    Tuyls, Karl
    Bellemare, Marc G.
    Dabney, Will
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [3] On the convergence of temporal-difference learning with linear function approximation
    Tadic, V
    MACHINE LEARNING, 2001, 42 (03) : 241 - 267
  • [4] On the Convergence of Temporal-Difference Learning with Linear Function Approximation
    Vladislav Tadić
    Machine Learning, 2001, 42 : 241 - 267
  • [5] Adaptive temporal-difference learning via deep neural network function approximation: a non-asymptotic analysis
    Wang, Guoyong
    Fu, Tiange
    Zheng, Ruijuan
    Zhao, Xuhui
    Zhu, Junlong
    Zhang, Mingchuan
    COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (02)
  • [6] Temporal-difference learning with nonlinear function approximation: lazy training and mean field regimes
    Agazzi, Andrea
    Lu, Jianfeng
    MATHEMATICAL AND SCIENTIFIC MACHINE LEARNING, VOL 145, 2021, 145 : 37 - 74
  • [7] Finite-Time Performance of Distributed Temporal-Difference Learning with Linear Function Approximation
    Doan, Thinh T.
    Maguluri, Siva Theja
    Romberg, Justin
    SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2021, 3 (01): : 298 - 320
  • [8] On Generalized Bellman Equations and Temporal-Difference Learning
    Yu, Huizhen
    Mahmood, A. Rupam
    Sutton, Richard S.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2018, 19
  • [9] New Versions of Gradient Temporal-Difference Learning
    Lee, Donghwan
    Lim, Han-Dong
    Park, Jihoon
    Choi, Okyong
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (08) : 5006 - 5013
  • [10] Average cost temporal-difference learning
    Tsitsiklis, JN
    Van Roy, B
    AUTOMATICA, 1999, 35 (11) : 1799 - 1808