Pathologies of Temporal Difference Methods in Approximate Dynamic Programming

被引:2
|
作者
Bertsekas, Dimitri P. [1 ]
机构
[1] MIT, Dept Electr Engn & Comp Sci, Cambridge, MA 02139 USA
来源
49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC) | 2010年
关键词
LEARNING TETRIS; BOUNDS;
D O I
10.1109/CDC.2010.5717644
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Approximate policy iteration methods based on temporal differences are popular in practice, and have been tested extensively, dating to the early nineties, but the associated convergence behavior is complex, and not well understood at present. An important question is whether the policy iteration process is seriously hampered by oscillations between poor policies, roughly similar to the attraction of gradient methods to poor local minima. There has been little apparent concern in the approximate DP/reinforcement learning literature about this possibility, even though it has been documented with several simple examples. Recent computational experimentation with the game of tetris, a popular testbed for approximate DP algorithms over a 15-year period, has brought the issue to sharp focus. In particular, using a standard set of 22 features and temporal difference methods, an average score of a few thousands was achieved. Using the same features and a random search method, an overwhelmingly better average score was achieved (600,000-900,000). The paper explains the likely mechanism of this phenomenon, and derives conditions under which it will not occur.
引用
收藏
页码:3034 / 3039
页数:6
相关论文
共 50 条
  • [1] Separable dynamic programming and approximate decomposition methods
    Bertsekas, Dimitri P.
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2007, 52 (05) : 911 - 916
  • [2] Approximate Dynamic Programming with Probabilistic Temporal Logic Constraints
    Li, Lening
    Fu, Jie
    2019 AMERICAN CONTROL CONFERENCE (ACC), 2019, : 1696 - 1703
  • [3] Approximate Dynamic Programming with Recursive Least-Squares Temporal Difference Learning for Adaptive Traffic Signal Control
    Yin, Biao
    Dridi, Mahjoub
    El Moudni, Abdellah
    2015 54TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2015, : 3463 - 3468
  • [4] Topological Approximate Dynamic Programming under Temporal Logic Constraints
    Li, Lening
    Fu, Jie
    2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 5330 - 5337
  • [5] Decentralized Bayesian search using approximate dynamic programming methods
    Zhao, Yijia
    Patek, Stephen D.
    Beling, Peter A.
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (04): : 970 - 975
  • [6] Path Planning in an Uncertain Environment Using Approximate Dynamic Programming Methods
    Bienkowski, Adam
    Sidoti, David
    Zhang, Lingyi
    Pattipati, Krishna R.
    Sampson, Charles R.
    Hansen, James
    2018 21ST INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2018, : 2542 - 2547
  • [7] Approximate dynamic programming methods for an inventory allocation problem under uncertainty
    Topaloglu, Huseyin
    Kunnumkal, Sumit
    NAVAL RESEARCH LOGISTICS, 2006, 53 (08) : 822 - 841
  • [8] Perspectives of approximate dynamic programming
    Powell, Warren B.
    ANNALS OF OPERATIONS RESEARCH, 2016, 241 (1-2) : 319 - 356
  • [9] A Survey of Approximate Dynamic Programming
    Wang Lin
    Peng Hui
    Zhu Hua-yong
    Shen Lin-cheng
    2009 INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS, VOL 2, PROCEEDINGS, 2009, : 396 - 399
  • [10] APPROXIMATE METHODS OF DISCRETE PROGRAMMING
    KORBUT, AA
    FINKELSHTEYN, YY
    ENGINEERING CYBERNETICS, 1983, 21 (01): : 124 - 134