Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes

被引：3

作者：

Tadic, VB ^{[1
]}

机构：

[1] Univ Sheffield, Dept Automat Control & Syst Engn, Sheffield S1 3JD, S Yorkshire, England

来源：

MACHINE LEARNING | 2006年 / 63卷 / 02期

关键词：

temporal-difference learning; neuro-dynamic programming; reinforcement learning; stochastic approximation; Markov chains;

D O I：

10.1007/s10994-006-5835-z

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The mean-square asymptotic behavior of temporal-difference learning algorithrns with constant step-sizes and linear function approximation is analyzed in this paper. The analysis is carried out for the case of discounted cost function associated with a Markov chain with a finite dimensional state-space. Under mild conditions, an upper bound for the asymptotic mean-square error of these algorithms is determined as a function of the step-size. Moreover, under the same assumptions, it is also shown that this bound is linear in the step size. The main results of the paper are illustrated with examples related to M/G/1 queues and nonlinear AR models with Markov switching.

引用

页码：107 / 133

页数：27

共 50 条

[31] On Generalized Bellman Equations and Temporal-Difference Learning
Yu, Huizhen
Mahmood, Ashique Rupam
Sutton, Richard S.
ADVANCES IN ARTIFICIAL INTELLIGENCE, CANADIAN AI 2017, 2017, 10233 : 3 - 14
[32] Postponed Updates for Temporal-Difference Reinforcement Learning
van Seijen, Harm
Whiteson, Shimon
2009 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, 2009, : 665 - +
[33] Weak Convergence Properties of Constrained Emphatic Temporal-difference Learning with Constant and Slowly Diminishing Stepsize
Yu, Huizhen
JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
[34] On the convergence of temporal-difference learning with linear function approximation
Tadic, V
MACHINE LEARNING, 2001, 42 (03) : 241 - 267
[35] Acceleration of Frank-Wolfe Algorithms with Open-Loop Step-Sizes
Wirth, Elias
Kerdreux, Thomas
Pokutta, Sebastian
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206 : 77 - 100
[36] On average versus discounted reward temporal-difference learning
Tsitsiklis, JN
Van Roy, B
MACHINE LEARNING, 2002, 49 (2-3) : 179 - 191
[37] Optimal Active Fault Diagnosis by Temporal-Difference Learning
Skach, Jan
Puncochar, Ivo
Lewis, Frank L.
2016 IEEE 55TH CONFERENCE ON DECISION AND CONTROL (CDC), 2016, : 2146 - 2151
[38] On Average Versus Discounted Reward Temporal-Difference Learning
John N. Tsitsiklis
Benjamin Van Roy
Machine Learning, 2002, 49 : 179 - 191
[39] Temporal-Difference Learning with Sampling Baseline for Image Captioning
Chen, Hui
Ding, Guiguang
Zhao, Sicheng
Han, Jungong
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 6706 - 6713
[40] On the Convergence of Temporal-Difference Learning with Linear Function Approximation
Vladislav Tadić
Machine Learning, 2001, 42 : 241 - 267

← 1 2 3 4 5 →