A Simple Finite-Time Analysis of TD Learning With Linear Function Approximation

被引:0
作者
Mitra, Aritra [1 ]
机构
[1] North Carolina State Univ, Dept Elect & Comp Engn, Raleigh, NC 27695 USA
关键词
Temporal difference learning; Approximation algorithms; Standards; Function approximation; Convergence; Perturbation methods; Noise; Vectors; Heuristic algorithms; Delays; Finite-time analysis; reinforcement learning; stochastic approximation; temporal difference learning;
D O I
10.1109/TAC.2024.3469328
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We study the finite-time convergence of temporal- difference (TD) learning with linear function approximation under Markovian sampling. Existing proofs for this setting either assume a projection step in the algorithm to simplify the analysis, or require a fairly intricate argument to ensure stability of the iterates. We ask: Is it possible to retain the simplicity of a projection-based analysis without actually performing a projection step in the algorithm? Our main contribution is to show this is possible via a novel two-step argument. In the first step, we use induction to prove that under a standard choice of a constant step-size alpha , the iterates generated by TD learning remain uniformly bounded in expectation. In the second step, we establish a recursion that mimics the steady-state dynamics of TD learning up to a bounded perturbation on the order of O ( alpha(2) ) that captures the effect of Markovian sampling. Combining these pieces leads to an overall approach that considerably simplifies existing proofs. We conjecture that our inductive proof technique will find applications in the analyses of more complex stochastic approximation algorithms, and conclude by providing some examples of such applications.
引用
收藏
页码:1388 / 1394
页数:7
相关论文
共 23 条
[1]  
Adibi A, 2024, PR MACH LEARN RES, V238
[2]  
[Anonymous], 2008, Stochastic Approximation: A Dynamical Systems View-point
[3]  
Arjevani Y, 2020, PR MACH LEARN RES, V117, P111
[4]   Error bounds for constant step-size Q-learning [J].
Beck, C. L. ;
Srikant, R. .
SYSTEMS & CONTROL LETTERS, 2012, 61 (12) :1203-1208
[5]  
Bhandari J., 2018, C LEARN THEOR COLT, P1691
[6]  
Borkar V, 2024, Arxiv, DOI arXiv:2110.14427
[7]  
Chen Z., 2019, PREPRINT
[8]  
Dalal G, 2018, AAAI CONF ARTIF INTE, P6144
[9]   Boundedness of iterates in Q-Learning [J].
Gosavi, A .
SYSTEMS & CONTROL LETTERS, 2006, 55 (04) :347-349
[10]   Bias and Extrapolation in Markovian Linear Stochastic Approximation with Constant Stepsizes [J].
Huo D.L. ;
Chen Y. ;
Xie Q. .
Performance Evaluation Review, 2023, 51 (01) :81-82