Sample Complexity and Overparameterization Bounds for Temporal-Difference Learning With Neural Network Approximation

被引：2

作者：

Cayci, Semih ^{[1
,2
]}

Satpathi, Siddhartha ^{[3
,4
]}

He, Niao ^{[5
]}

Srikant, R. ^{[1
,6
]}

机构：

[1] Univ Illinois, Coordinated Sci Lab, Urbana, IL 61801 USA

[2] Rhein Westfal TH Aachen, Chair Math Informat Proc, D-52062 Aachen, Germany

[3] Univ Illinois, Urbana, IL 61801 USA

[4] Mayo Clin, Rochester, MN 55902 USA

[5] Swiss Fed Inst Technol, Dept Comp Sci, CH-8006 Zurich, Switzerland

[6] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL 61801 USA

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2023年 / 68卷 / 05期

基金：

瑞士国家科学基金会; 美国国家科学基金会;

关键词：

Neural networks; Approximation algorithms; Markov processes; Convergence; Complexity theory; Reinforcement learning; Kernel; reinforcement learning (RL); stochastic approximation; temporal-difference (TD) learning;

D O I：

10.1109/TAC.2023.3234234

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this article, we study the dynamics of temporal-difference (TD) learning with neural network-based value function approximation over a general state space, namely, neural TD learning. We consider two practically used algorithms, projection-free and max-norm regularized neural TD learning, and establish the first convergence bounds for these algorithms. An interesting observation from our results is that max-norm regularization can dramatically improve the performance of TD learning algorithms in terms of sample complexity and overparameterization. The results in this work rely on a Lyapunov drift analysis of the network parameters as a stopped and controlled random process.

引用

页码：2891 / 2905

页数：15

共 50 条

[21] Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes
Tadic, VB
MACHINE LEARNING, 2006, 63 (02) : 107 - 133
[22] Implementing Temporal-Difference Learning with the Scaled Conjugate Gradient Algorithm
Tasos Falas
Andreas Stafylopatis
Neural Processing Letters, 2005, 22 : 361 - 375
[23] Using temporal-difference learning for multi-agent bargaining
Huang, Shiu-li
Lin, Fu-ren
ELECTRONIC COMMERCE RESEARCH AND APPLICATIONS, 2008, 7 (04) : 432 - 442
[24] Temporal-Difference Learning An Online Support Vector Regression Approach
Teixeira, Hugo Tanzarella
Bottura, Celso Pascoli
ICIMCO 2015 PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON INFORMATICS IN CONTROL, AUTOMATION AND ROBOTICS, VOL. 1, 2015, : 318 - 323
[25] Correlation minimizing replay memory in temporal-difference reinforcement learning
Ramicic, Mirza
Bonarinib, Andrea
NEUROCOMPUTING, 2020, 393 : 91 - 100
[26] Implementing temporal-difference learning with the scaled conjugate gradient algorithm
Falas, T
Stafylopatis, A
NEURAL PROCESSING LETTERS, 2005, 22 (03) : 361 - 375
[27] Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity
Liu, Bo
Gemp, Ian
Ghavamzadeh, Mohammad
Liu, Ji
Mahadevan, Sridhar
Petrik, Marek
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2018, 63 : 461 - 494
[28] On the existence of fixed points for approximate value iteration and temporal-difference learning
De Farias, DP
Van Roy, B
JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2000, 105 (03) : 589 - 608
[29] An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning
Sutton, Richard S.
Mahmood, A. Rupam
White, Martha
JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
[30] CONCEPT EXTRACTION USING TEMPORAL-DIFFERENCE NETWORK EUROCON2009
Karbasian, Habib
Ahmadabadi, Majid N.
Araabi, Babak N.
EUROCON 2009: INTERNATIONAL IEEE CONFERENCE DEVOTED TO THE 150 ANNIVERSARY OF ALEXANDER S. POPOV, VOLS 1- 4, PROCEEDINGS, 2009, : 1888 - 1894

← 1 2 3 4 5 →