Average cost temporal-difference learning

被引：0

作者：

Tsitsiklis, JN ^{[1
]}

Van Roy, B ^{[1
]}

机构：

[1] MIT, Informat & Decis Syst Lab, Cambridge, MA 02139 USA

来源：

PROCEEDINGS OF THE 36TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-5 | 1997年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We describe a variant of temporal-difference learning that approximates average and differential costs of an irreducible aperiodic Markov chain. Approximations are comprised of linear combinations of fixed basis functions whose weights are incrementally updated during a single endless trajectory of the Markov chain. We present results concerning convergence and the limit of convergence. We also provide a bound on the resulting approximation error that exhibits an interesting dependence on the "mixing time" of the Markov chain. The results parallel previous work by the authors, involving approximations of discounted cost-to-go.

引用

页码：498 / 502

页数：5

共 50 条

[31] On the asymptotic behavior of a constant stepsize temporal-difference learning algorithm
Tadic, A
COMPUTATIONAL LEARNING THEORY, 1999, 1572 : 126 - 137
[32] Implementing Temporal-Difference Learning with the Scaled Conjugate Gradient Algorithm
Tasos Falas
Andreas Stafylopatis
Neural Processing Letters, 2005, 22 : 361 - 375
[33] Fuzzy interpretation for temporal-difference learning in anomaly detection problems
Sukhanov, A. V.
Kovalev, S. M.
Styskala, V.
BULLETIN OF THE POLISH ACADEMY OF SCIENCES-TECHNICAL SCIENCES, 2016, 64 (03) : 625 - 632
[34] Fuzzy interpretation for temporal-difference learning in anomaly detection problems
Sukhanov A.V.
Kovalev S.M.
Stýskala V.
Sukhanov, A.V. (drewnia@rambler.ru), 1600, Polska Akademia Nauk (64): : 625 - 632
[35] Online Multi-Task Gradient Temporal-Difference Learning
Sreenivasan, Vishnu Purushothaman
Ammar, Haitham Bou
Eaton, Eric
PROCEEDINGS OF THE TWENTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2014, : 3136 - 3137
[36] Using temporal-difference learning for multi-agent bargaining
Huang, Shiu-li
Lin, Fu-ren
ELECTRONIC COMMERCE RESEARCH AND APPLICATIONS, 2008, 7 (04) : 432 - 442
[37] Temporal-Difference Q-learning in Active Fault Diagnosis
Skach, Jan
Puncochar, Ivo
Lewis, Frank L.
2016 3RD CONFERENCE ON CONTROL AND FAULT-TOLERANT SYSTEMS (SYSTOL), 2016, : 287 - 292
[38] Temporal-Difference Learning An Online Support Vector Regression Approach
Teixeira, Hugo Tanzarella
Bottura, Celso Pascoli
ICIMCO 2015 PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON INFORMATICS IN CONTROL, AUTOMATION AND ROBOTICS, VOL. 1, 2015, : 318 - 323
[39] Correlation minimizing replay memory in temporal-difference reinforcement learning
Ramicic, Mirza
Bonarinib, Andrea
NEUROCOMPUTING, 2020, 393 : 91 - 100
[40] Implementing temporal-difference learning with the scaled conjugate gradient algorithm
Falas, T
Stafylopatis, A
NEURAL PROCESSING LETTERS, 2005, 22 (03) : 361 - 375

← 1 2 3 4 5 →