Understanding End-to-End Model-Based Reinforcement Learning Methods as Implicit Parameterization

被引：0

作者：

Gehring, Clement ^{[1
]}

Kawaguchi, Kenji ^{[2
]}

Huang, Jiaoyang ^{[3
]}

Kaelbling, Leslie Pack ^{[1
]}

机构：

[1] MIT, Elect Engn & Comp Sci, Cambridge, MA 02139 USA

[2] Harvard Univ, Ctr Math Sci & Applicat, Cambridge, MA 02138 USA

[3] NYU, Courant Inst Math Sci, New York, NY 10003 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Estimating the per-state expected cumulative rewards is a critical aspect of reinforcement learning approaches, however the experience is obtained, but standard deep neural-network function-approximation methods are often inefficient in this setting. An alternative approach, exemplified by value iteration networks, is to learn transition and reward models of a latent Markov decision process whose value predictions fit the data. This approach has been shown empirically to converge faster to a more robust solution in many cases, but there has been little theoretical study of this phenomenon. In this paper, we explore such implicit representations of value functions via theory and focused experimentation. We prove that, for a linear parametrization, gradient descent converges to global optima despite non-linearity and non-convexity introduced by the implicit representation. Furthermore, we derive convergence rates for both cases which allow us to identify conditions under which stochastic gradient descent (SGD) with this implicit representation converges substantially faster than its explicit counterpart. Finally, we provide empirical results in some simple domains that illustrate the theoretical findings.

引用

页数：12

共 26 条

[1]

[Anonymous], 1995, Machine Learning Proceedings 1995

[2]

Boyd S., 2004, Convex optimization, DOI 10.1017/CBO9780511804441

[3]

Boyd S., 2008, LECT NOTES EE364B

[4]

Huang J., 2020, INT C MACH LEARN

[5]

Ji Z., 2020, Advances in Neural Information Processing Systems, V33

[6]

Karkus P, 2019, ROBOTICS: SCIENCE AND SYSTEMS XV

[7]

Kawaguchi K., 2016, ADV NEURAL INFORM PR, V29, P586

[8]

Kawaguchi K., 2021, ARXIV210614836

[9] Depth with nonlinearity creates no bad local minima in ResNets [J].

Kawaguchi, Kenji ;

Bengio, Yoshua .

NEURAL NETWORKS, 2019, 118 :167-174

[10]

Kawaguchi Kenji, 2021, INT C LEARN REPR

← 1 2 3 →