Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes

被引:0
作者
Vladislav B. Tadić
机构
[1] University of Sheffield,Department of Automatic Control and Systems Engineering
来源
Machine Learning | 2006年 / 63卷
关键词
Temporal-difference learning; Neuro-dynamic programming; Reinforcement learning; Stochastic approximation; Markov chains;
D O I
暂无
中图分类号
学科分类号
摘要
The mean-square asymptotic behavior of temporal-difference learning algorithms with constant step-sizes and linear function approximation is analyzed in this paper. The analysis is carried out for the case of discounted cost function associated with a Markov chain with a finite dimensional state-space. Under mild conditions, an upper bound for the asymptotic mean-square error of these algorithms is determined as a function of the step-size. Moreover, under the same assumptions, it is also shown that this bound is linear in the step size. The main results of the paper are illustrated with examples related to M/G/1 queues and nonlinear AR models with Markov switching.
引用
收藏
页码:107 / 133
页数:26
相关论文
共 50 条
[31]   Gradient temporal-difference learning for off-policy evaluation using emphatic weightings [J].
Cao, Jiaqing ;
Liu, Quan ;
Zhu, Fei ;
Fu, Qiming ;
Zhong, Shan .
INFORMATION SCIENCES, 2021, 580 :311-330
[32]   Temporal-difference learning with nonlinear function approximation: lazy training and mean field regimes [J].
Agazzi, Andrea ;
Lu, Jianfeng .
MATHEMATICAL AND SCIENTIFIC MACHINE LEARNING, VOL 145, 2021, 145 :37-74
[33]   An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning [J].
Sutton, Richard S. ;
Mahmood, A. Rupam ;
White, Martha .
JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
[34]   Optimization of music education strategy guided by the temporal-difference reinforcement learning algorithm [J].
Su, Yingwei ;
Wang, Yuan .
Soft Computing, 2024, 28 (13-14) :8279-8291
[35]   An Adaptive Network Slice Combination Algorithm Based on Multistep Temporal-Difference Learning [J].
Wu, Guomin ;
Tan, Guoping .
IEEE WIRELESS COMMUNICATIONS LETTERS, 2022, 11 (06) :1128-1132
[36]   Policy Evaluation and Temporal-Difference Learning in Continuous Time and Space: A Martingale Approach [J].
Jia, Yanwei ;
Zhou, Xun Yu .
JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
[37]   IMPROVING REINFORCEMENT LEARNING USING TEMPORAL-DIFFERENCE NETWORK EUROCON2009 [J].
Karbasian, Habib ;
Ahmadabadi, Majid N. ;
Araabi, Babak N. .
EUROCON 2009: INTERNATIONAL IEEE CONFERENCE DEVOTED TO THE 150 ANNIVERSARY OF ALEXANDER S. POPOV, VOLS 1- 4, PROCEEDINGS, 2009, :1716-1722
[38]   Particle swarm optimization based on temporal-difference learning for solving multi-objective optimization problems [J].
Zhang, Desong ;
Zhu, Guangyu .
COMPUTING, 2023, 105 (08) :1795-1820
[39]   Particle swarm optimization based on temporal-difference learning for solving multi-objective optimization problems [J].
Desong Zhang ;
Guangyu Zhu .
Computing, 2023, 105 :1795-1820
[40]   Striatal and Tegmental Neurons Code Critical Signals for Temporal-Difference Learning of State Value in Domestic Chicks [J].
Wen, Chentao ;
Ogura, Yukiko ;
Matsushima, Toshiya .
FRONTIERS IN NEUROSCIENCE, 2016, 10