Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes

被引：0

作者：

Vladislav B. Tadić

机构：

[1] University of Sheffield,Department of Automatic Control and Systems Engineering

来源：

Machine Learning | 2006年 / 63卷

关键词：

Temporal-difference learning; Neuro-dynamic programming; Reinforcement learning; Stochastic approximation; Markov chains;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

The mean-square asymptotic behavior of temporal-difference learning algorithms with constant step-sizes and linear function approximation is analyzed in this paper. The analysis is carried out for the case of discounted cost function associated with a Markov chain with a finite dimensional state-space. Under mild conditions, an upper bound for the asymptotic mean-square error of these algorithms is determined as a function of the step-size. Moreover, under the same assumptions, it is also shown that this bound is linear in the step size. The main results of the paper are illustrated with examples related to M/G/1 queues and nonlinear AR models with Markov switching.

引用

页码：107 / 133

页数：26

共 50 条

[31] Gradient temporal-difference learning for off-policy evaluation using emphatic weightings [J].

Cao, Jiaqing ;

Liu, Quan ;

Zhu, Fei ;

Fu, Qiming ;

Zhong, Shan .

INFORMATION SCIENCES, 2021, 580 :311-330

[32] Temporal-difference learning with nonlinear function approximation: lazy training and mean field regimes [J].

Agazzi, Andrea ;

Lu, Jianfeng .

MATHEMATICAL AND SCIENTIFIC MACHINE LEARNING, VOL 145, 2021, 145 :37-74

[33] An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning [J].

Sutton, Richard S. ;

Mahmood, A. Rupam ;

White, Martha .

JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17

[34] Optimization of music education strategy guided by the temporal-difference reinforcement learning algorithm [J].

Su, Yingwei ;

Wang, Yuan .

Soft Computing, 2024, 28 (13-14) :8279-8291

[35] An Adaptive Network Slice Combination Algorithm Based on Multistep Temporal-Difference Learning [J].

Wu, Guomin ;

Tan, Guoping .

IEEE WIRELESS COMMUNICATIONS LETTERS, 2022, 11 (06) :1128-1132

[36] Policy Evaluation and Temporal-Difference Learning in Continuous Time and Space: A Martingale Approach [J].

Jia, Yanwei ;

Zhou, Xun Yu .

JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23

[37] IMPROVING REINFORCEMENT LEARNING USING TEMPORAL-DIFFERENCE NETWORK EUROCON2009 [J].

Karbasian, Habib ;

Ahmadabadi, Majid N. ;

Araabi, Babak N. .

EUROCON 2009: INTERNATIONAL IEEE CONFERENCE DEVOTED TO THE 150 ANNIVERSARY OF ALEXANDER S. POPOV, VOLS 1- 4, PROCEEDINGS, 2009, :1716-1722

[38] Particle swarm optimization based on temporal-difference learning for solving multi-objective optimization problems [J].

Zhang, Desong ;

Zhu, Guangyu .

COMPUTING, 2023, 105 (08) :1795-1820

[39] Particle swarm optimization based on temporal-difference learning for solving multi-objective optimization problems [J].

Desong Zhang ;

Guangyu Zhu .

Computing, 2023, 105 :1795-1820

[40] Striatal and Tegmental Neurons Code Critical Signals for Temporal-Difference Learning of State Value in Domestic Chicks [J].

Wen, Chentao ;

Ogura, Yukiko ;

Matsushima, Toshiya .

FRONTIERS IN NEUROSCIENCE, 2016, 10

← 1 2 3 4 5 →