Sample Complexity and Overparameterization Bounds for Temporal-Difference Learning With Neural Network Approximation

被引:2
|
作者
Cayci, Semih [1 ,2 ]
Satpathi, Siddhartha [3 ,4 ]
He, Niao [5 ]
Srikant, R. [1 ,6 ]
机构
[1] Univ Illinois, Coordinated Sci Lab, Urbana, IL 61801 USA
[2] Rhein Westfal TH Aachen, Chair Math Informat Proc, D-52062 Aachen, Germany
[3] Univ Illinois, Urbana, IL 61801 USA
[4] Mayo Clin, Rochester, MN 55902 USA
[5] Swiss Fed Inst Technol, Dept Comp Sci, CH-8006 Zurich, Switzerland
[6] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL 61801 USA
基金
瑞士国家科学基金会; 美国国家科学基金会;
关键词
Neural networks; Approximation algorithms; Markov processes; Convergence; Complexity theory; Reinforcement learning; Kernel; reinforcement learning (RL); stochastic approximation; temporal-difference (TD) learning;
D O I
10.1109/TAC.2023.3234234
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we study the dynamics of temporal-difference (TD) learning with neural network-based value function approximation over a general state space, namely, neural TD learning. We consider two practically used algorithms, projection-free and max-norm regularized neural TD learning, and establish the first convergence bounds for these algorithms. An interesting observation from our results is that max-norm regularization can dramatically improve the performance of TD learning algorithms in terms of sample complexity and overparameterization. The results in this work rely on a Lyapunov drift analysis of the network parameters as a stopped and controlled random process.
引用
收藏
页码:2891 / 2905
页数:15
相关论文
共 50 条
  • [11] A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning
    David Choi
    Benjamin Van Roy
    Discrete Event Dynamic Systems, 2006, 16 : 207 - 239
  • [12] A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning
    Choi, David
    Van Roy, Benjamin
    DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS, 2006, 16 (02): : 207 - 239
  • [13] A Non-asymptotic Analysis of Non-parametric Temporal-Difference Learning
    Berthier, Eloise
    Kobeissi, Ziad
    Bach, Francis
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [14] IMPROVING REINFORCEMENT LEARNING USING TEMPORAL-DIFFERENCE NETWORK EUROCON2009
    Karbasian, Habib
    Ahmadabadi, Majid N.
    Araabi, Babak N.
    EUROCON 2009: INTERNATIONAL IEEE CONFERENCE DEVOTED TO THE 150 ANNIVERSARY OF ALEXANDER S. POPOV, VOLS 1- 4, PROCEEDINGS, 2009, : 1716 - 1722
  • [15] An Adaptive Network Slice Combination Algorithm Based on Multistep Temporal-Difference Learning
    Wu, Guomin
    Tan, Guoping
    IEEE WIRELESS COMMUNICATIONS LETTERS, 2022, 11 (06) : 1128 - 1132
  • [16] Weak Convergence Properties of Constrained Emphatic Temporal-difference Learning with Constant and Slowly Diminishing Stepsize
    Yu, Huizhen
    JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [17] Distributed multi-agent temporal-difference learning with full neighbor information
    Peng, Zhinan
    Hu, Jiangping
    Luo, Rui
    Ghosh, Bijoy K.
    CONTROL THEORY AND TECHNOLOGY, 2020, 18 (04) : 379 - 389
  • [18] Multi-Agent Temporal-Difference Learning with Linear Function Approximation: Weak Convergence under Time-Varying Network Topologies
    Stankovic, Milos S.
    Stankovic, Srdjan S.
    2016 AMERICAN CONTROL CONFERENCE (ACC), 2016, : 167 - 172
  • [19] On the asymptotic behavior of a constant stepsize temporal-difference learning algorithm
    Tadic, A
    COMPUTATIONAL LEARNING THEORY, 1999, 1572 : 126 - 137
  • [20] Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes
    Vladislav B. Tadić
    Machine Learning, 2006, 63 : 107 - 133