Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes

被引：3

作者：

Tadic, VB ^{[1
]}

机构：

[1] Univ Sheffield, Dept Automat Control & Syst Engn, Sheffield S1 3JD, S Yorkshire, England

来源：

MACHINE LEARNING | 2006年 / 63卷 / 02期

关键词：

temporal-difference learning; neuro-dynamic programming; reinforcement learning; stochastic approximation; Markov chains;

D O I：

10.1007/s10994-006-5835-z

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The mean-square asymptotic behavior of temporal-difference learning algorithrns with constant step-sizes and linear function approximation is analyzed in this paper. The analysis is carried out for the case of discounted cost function associated with a Markov chain with a finite dimensional state-space. Under mild conditions, an upper bound for the asymptotic mean-square error of these algorithms is determined as a function of the step-size. Moreover, under the same assumptions, it is also shown that this bound is linear in the step size. The main results of the paper are illustrated with examples related to M/G/1 queues and nonlinear AR models with Markov switching.

引用

页码：107 / 133

页数：27

共 50 条

[21] On Generalized Bellman Equations and Temporal-Difference Learning
Yu, Huizhen
Mahmood, A. Rupam
Sutton, Richard S.
JOURNAL OF MACHINE LEARNING RESEARCH, 2018, 19
[22] Gradient Temporal-Difference Learning with Regularized Corrections
Ghiassian, Sina
Patterson, Andrew
Garg, Shivam
Gupta, Dhawal
White, Adam
White, Martha
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[23] Relative Loss Bounds for Temporal-Difference Learning
Jürgen Forster
Manfred K. Warmuth
Machine Learning, 2003, 51 : 23 - 50
[24] Nonlinear Distributional Gradient Temporal-Difference Learning
Qu, Chao
Mannor, Shie
Xu, Huan
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[25] Temporal-Difference Reinforcement Learning with Distributed Representations
Kurth-Nelson, Zeb
Redish, A. David
PLOS ONE, 2009, 4 (10):
[26] Gradient Temporal-Difference Learning with Regularized Corrections
Ghiassian, Sina
Patterson, Andrew
Garg, Shivam
Gupta, Dhawal
White, Adam
White, Martha
25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
[27] Relative loss bounds for temporal-difference learning
Forster, J
Warmuth, MK
MACHINE LEARNING, 2003, 51 (01) : 23 - 50
[28] Approximate value iteration and temporal-difference learning
de Farias, DP
Van Roy, B
IEEE 2000 ADAPTIVE SYSTEMS FOR SIGNAL PROCESSING, COMMUNICATIONS, AND CONTROL SYMPOSIUM - PROCEEDINGS, 2000, : 48 - 51
[29] Target-Based Temporal-Difference Learning
Lee, Donghwan
He, Niao
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[30] New Versions of Gradient Temporal-Difference Learning
Lee, Donghwan
Lim, Han-Dong
Park, Jihoon
Choi, Okyong
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (08) : 5006 - 5013

← 1 2 3 4 5 →