Sample Complexity and Overparameterization Bounds for Temporal-Difference Learning With Neural Network Approximation

被引：2

作者：

Cayci, Semih ^{[1
,2
]}

Satpathi, Siddhartha ^{[3
,4
]}

He, Niao ^{[5
]}

Srikant, R. ^{[1
,6
]}

机构：

[1] Univ Illinois, Coordinated Sci Lab, Urbana, IL 61801 USA

[2] Rhein Westfal TH Aachen, Chair Math Informat Proc, D-52062 Aachen, Germany

[3] Univ Illinois, Urbana, IL 61801 USA

[4] Mayo Clin, Rochester, MN 55902 USA

[5] Swiss Fed Inst Technol, Dept Comp Sci, CH-8006 Zurich, Switzerland

[6] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL 61801 USA

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2023年 / 68卷 / 05期

基金：

瑞士国家科学基金会; 美国国家科学基金会;

关键词：

Neural networks; Approximation algorithms; Markov processes; Convergence; Complexity theory; Reinforcement learning; Kernel; reinforcement learning (RL); stochastic approximation; temporal-difference (TD) learning;

D O I：

10.1109/TAC.2023.3234234

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this article, we study the dynamics of temporal-difference (TD) learning with neural network-based value function approximation over a general state space, namely, neural TD learning. We consider two practically used algorithms, projection-free and max-norm regularized neural TD learning, and establish the first convergence bounds for these algorithms. An interesting observation from our results is that max-norm regularization can dramatically improve the performance of TD learning algorithms in terms of sample complexity and overparameterization. The results in this work rely on a Lyapunov drift analysis of the network parameters as a stopped and controlled random process.

引用

页码：2891 / 2905

页数：15

共 50 条

[11] A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning
David Choi
Benjamin Van Roy
Discrete Event Dynamic Systems, 2006, 16 : 207 - 239
[12] A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning
Choi, David
Van Roy, Benjamin
DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS, 2006, 16 (02): : 207 - 239
[13] A Non-asymptotic Analysis of Non-parametric Temporal-Difference Learning
Berthier, Eloise
Kobeissi, Ziad
Bach, Francis
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[14] IMPROVING REINFORCEMENT LEARNING USING TEMPORAL-DIFFERENCE NETWORK EUROCON2009
Karbasian, Habib
Ahmadabadi, Majid N.
Araabi, Babak N.
EUROCON 2009: INTERNATIONAL IEEE CONFERENCE DEVOTED TO THE 150 ANNIVERSARY OF ALEXANDER S. POPOV, VOLS 1- 4, PROCEEDINGS, 2009, : 1716 - 1722
[15] An Adaptive Network Slice Combination Algorithm Based on Multistep Temporal-Difference Learning
Wu, Guomin
Tan, Guoping
IEEE WIRELESS COMMUNICATIONS LETTERS, 2022, 11 (06) : 1128 - 1132
[16] Weak Convergence Properties of Constrained Emphatic Temporal-difference Learning with Constant and Slowly Diminishing Stepsize
Yu, Huizhen
JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
[17] Distributed multi-agent temporal-difference learning with full neighbor information
Peng, Zhinan
Hu, Jiangping
Luo, Rui
Ghosh, Bijoy K.
CONTROL THEORY AND TECHNOLOGY, 2020, 18 (04) : 379 - 389
[18] Multi-Agent Temporal-Difference Learning with Linear Function Approximation: Weak Convergence under Time-Varying Network Topologies
Stankovic, Milos S.
Stankovic, Srdjan S.
2016 AMERICAN CONTROL CONFERENCE (ACC), 2016, : 167 - 172
[19] On the asymptotic behavior of a constant stepsize temporal-difference learning algorithm
Tadic, A
COMPUTATIONAL LEARNING THEORY, 1999, 1572 : 126 - 137
[20] Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes
Vladislav B. Tadić
Machine Learning, 2006, 63 : 107 - 133

← 1 2 3 4 5 →