Feature-Based Aggregation and Deep Reinforcement Learning:A Survey and Some New Implementations

被引：1

作者：

Dimitri P.Bertsekas ^{[1
]}

机构：

[1] the Department of Electrical Engineering and Computer Science,and the Laboratory for Information and Decision Systems,Massachusetts Institute of Technology

来源：

IEEE/CAA Journal of Automatica Sinica | 2019年 / 6卷 / 01期

关键词：

Reinforcement learning; dynamic programming; Markovian decision problems; aggregation; feature-based architectures; policy iteration; deep neural networks; rollout algorithms;

D O I：

暂无

中图分类号：

TP18 [人工智能理论]; O225 [对策论（博弈论）];

学科分类号：

070105 ; 081104 ; 0812 ; 0835 ; 1201 ; 1405 ;

摘要：

In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement.

引用

页码：1 / 31

页数：31

共 20 条

[1] Approximate policy iteration:a survey and somenew methods[J]. Dimitri P.BERTSEKAS.Journal of Control Theory and Applications. 2011(03)
[2] Deep learning in neural networks: An overview[J] . Jürgen Schmidhuber.Neural Networks . 2014
[3] Error Bounds for Approximations from Projected Linear Equations
Yu, Huizhen
Bertsekas, Dimitri P.
[J]. MATHEMATICS OF OPERATIONS RESEARCH, 2010, 35 (02) : 306 - 329
[4] A fast learning algorithm for deep belief nets
Hinton, Geoffrey E.
Osindero, Simon
Teh, Yee-Whye
[J]. NEURAL COMPUTATION, 2006, 18 (07) : 1527 - 1554
[5] Programming backgammon using self-teaching neural nets[J] . Gerald Tesauro.Artificial Intelligence . 2002 (1)
[6] Rollout algorithms for stochastic scheduling problems
Bertsekas, DP
Castañon, DA
[J]. JOURNAL OF HEURISTICS, 1999, 5 (01) : 89 - 108
[7] Rollout Algorithms for Combinatorial Optimization[J] . Dimitri P. Bertsekas,John N. Tsitsiklis,Cynara Wu.Journal of Heuristics . 1997 (3)
[8] Feature-Based Methods for Large Scale Dynamic Programming[J] . John N. Tsitsiklis,Benjamin Van Roy.Machine Learning . 1996 (1)
[9] A COUNTEREXAMPLE TO TEMPORAL DIFFERENCES LEARNING
BERTSEKAS, DP
[J]. NEURAL COMPUTATION, 1995, 7 (02) : 270 - 279
[10] TEMPORAL DIFFERENCE LEARNING AND TD-GAMMON
TESAURO, G
[J]. COMMUNICATIONS OF THE ACM, 1995, 38 (03) : 58 - 68

← 1 2 →