Pathologies of Temporal Difference Methods in Approximate Dynamic Programming

被引：2

作者：

Bertsekas, Dimitri P. ^{[1
]}

机构：

[1] MIT, Dept Electr Engn & Comp Sci, Cambridge, MA 02139 USA

来源：

49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC) | 2010年

关键词：

LEARNING TETRIS; BOUNDS;

D O I：

10.1109/CDC.2010.5717644

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Approximate policy iteration methods based on temporal differences are popular in practice, and have been tested extensively, dating to the early nineties, but the associated convergence behavior is complex, and not well understood at present. An important question is whether the policy iteration process is seriously hampered by oscillations between poor policies, roughly similar to the attraction of gradient methods to poor local minima. There has been little apparent concern in the approximate DP/reinforcement learning literature about this possibility, even though it has been documented with several simple examples. Recent computational experimentation with the game of tetris, a popular testbed for approximate DP algorithms over a 15-year period, has brought the issue to sharp focus. In particular, using a standard set of 22 features and temporal difference methods, an average score of a few thousands was achieved. Using the same features and a random search method, an overwhelmingly better average score was achieved (600,000-900,000). The paper explains the likely mechanism of this phenomenon, and derives conditions under which it will not occur.

引用

页码：3034 / 3039

页数：6

共 50 条

[1] Separable dynamic programming and approximate decomposition methods
Bertsekas, Dimitri P.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2007, 52 (05) : 911 - 916
[2] Approximate Dynamic Programming with Probabilistic Temporal Logic Constraints
Li, Lening
Fu, Jie
2019 AMERICAN CONTROL CONFERENCE (ACC), 2019, : 1696 - 1703
[3] Approximate Dynamic Programming with Recursive Least-Squares Temporal Difference Learning for Adaptive Traffic Signal Control
Yin, Biao
Dridi, Mahjoub
El Moudni, Abdellah
2015 54TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2015, : 3463 - 3468
[4] Topological Approximate Dynamic Programming under Temporal Logic Constraints
Li, Lening
Fu, Jie
2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 5330 - 5337
[5] Decentralized Bayesian search using approximate dynamic programming methods
Zhao, Yijia
Patek, Stephen D.
Beling, Peter A.
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (04): : 970 - 975
[6] Path Planning in an Uncertain Environment Using Approximate Dynamic Programming Methods
Bienkowski, Adam
Sidoti, David
Zhang, Lingyi
Pattipati, Krishna R.
Sampson, Charles R.
Hansen, James
2018 21ST INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2018, : 2542 - 2547
[7] Approximate dynamic programming methods for an inventory allocation problem under uncertainty
Topaloglu, Huseyin
Kunnumkal, Sumit
NAVAL RESEARCH LOGISTICS, 2006, 53 (08) : 822 - 841
[8] Perspectives of approximate dynamic programming
Powell, Warren B.
ANNALS OF OPERATIONS RESEARCH, 2016, 241 (1-2) : 319 - 356
[9] A Survey of Approximate Dynamic Programming
Wang Lin
Peng Hui
Zhu Hua-yong
Shen Lin-cheng
2009 INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS, VOL 2, PROCEEDINGS, 2009, : 396 - 399
[10] APPROXIMATE METHODS OF DISCRETE PROGRAMMING
KORBUT, AA
FINKELSHTEYN, YY
ENGINEERING CYBERNETICS, 1983, 21 (01): : 124 - 134

← 1 2 3 4 5 →