Sample Complexity and Overparameterization Bounds for Temporal-Difference Learning With Neural Network Approximation

被引:2
|
作者
Cayci, Semih [1 ,2 ]
Satpathi, Siddhartha [3 ,4 ]
He, Niao [5 ]
Srikant, R. [1 ,6 ]
机构
[1] Univ Illinois, Coordinated Sci Lab, Urbana, IL 61801 USA
[2] Rhein Westfal TH Aachen, Chair Math Informat Proc, D-52062 Aachen, Germany
[3] Univ Illinois, Urbana, IL 61801 USA
[4] Mayo Clin, Rochester, MN 55902 USA
[5] Swiss Fed Inst Technol, Dept Comp Sci, CH-8006 Zurich, Switzerland
[6] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL 61801 USA
基金
瑞士国家科学基金会; 美国国家科学基金会;
关键词
Neural networks; Approximation algorithms; Markov processes; Convergence; Complexity theory; Reinforcement learning; Kernel; reinforcement learning (RL); stochastic approximation; temporal-difference (TD) learning;
D O I
10.1109/TAC.2023.3234234
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we study the dynamics of temporal-difference (TD) learning with neural network-based value function approximation over a general state space, namely, neural TD learning. We consider two practically used algorithms, projection-free and max-norm regularized neural TD learning, and establish the first convergence bounds for these algorithms. An interesting observation from our results is that max-norm regularization can dramatically improve the performance of TD learning algorithms in terms of sample complexity and overparameterization. The results in this work rely on a Lyapunov drift analysis of the network parameters as a stopped and controlled random process.
引用
收藏
页码:2891 / 2905
页数:15
相关论文
共 50 条
  • [21] Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes
    Tadic, VB
    MACHINE LEARNING, 2006, 63 (02) : 107 - 133
  • [22] Implementing Temporal-Difference Learning with the Scaled Conjugate Gradient Algorithm
    Tasos Falas
    Andreas Stafylopatis
    Neural Processing Letters, 2005, 22 : 361 - 375
  • [23] Using temporal-difference learning for multi-agent bargaining
    Huang, Shiu-li
    Lin, Fu-ren
    ELECTRONIC COMMERCE RESEARCH AND APPLICATIONS, 2008, 7 (04) : 432 - 442
  • [24] Temporal-Difference Learning An Online Support Vector Regression Approach
    Teixeira, Hugo Tanzarella
    Bottura, Celso Pascoli
    ICIMCO 2015 PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON INFORMATICS IN CONTROL, AUTOMATION AND ROBOTICS, VOL. 1, 2015, : 318 - 323
  • [25] Correlation minimizing replay memory in temporal-difference reinforcement learning
    Ramicic, Mirza
    Bonarinib, Andrea
    NEUROCOMPUTING, 2020, 393 : 91 - 100
  • [26] Implementing temporal-difference learning with the scaled conjugate gradient algorithm
    Falas, T
    Stafylopatis, A
    NEURAL PROCESSING LETTERS, 2005, 22 (03) : 361 - 375
  • [27] Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity
    Liu, Bo
    Gemp, Ian
    Ghavamzadeh, Mohammad
    Liu, Ji
    Mahadevan, Sridhar
    Petrik, Marek
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2018, 63 : 461 - 494
  • [28] On the existence of fixed points for approximate value iteration and temporal-difference learning
    De Farias, DP
    Van Roy, B
    JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2000, 105 (03) : 589 - 608
  • [29] An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning
    Sutton, Richard S.
    Mahmood, A. Rupam
    White, Martha
    JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [30] CONCEPT EXTRACTION USING TEMPORAL-DIFFERENCE NETWORK EUROCON2009
    Karbasian, Habib
    Ahmadabadi, Majid N.
    Araabi, Babak N.
    EUROCON 2009: INTERNATIONAL IEEE CONFERENCE DEVOTED TO THE 150 ANNIVERSARY OF ALEXANDER S. POPOV, VOLS 1- 4, PROCEEDINGS, 2009, : 1888 - 1894