Speedy Categorical Distributional Reinforcement Learning and Complexity Analysis

被引：0

作者：

Boeck, Markus ^{[1
]}

Heitzinger, Clemens ^{[1
]}

机构：

[1] TU Wien, Dept Math & Geoinformat, Vienna, Austria

来源：

SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE | 2022年 / 4卷 / 02期

关键词：

reinforcement learning; distributional reinforcement learning; Q-learning; PAC bounds; complexity analysis;

D O I：

10.1137/20M1364436

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

In distributional reinforcement learning, the entire distribution of the return instead of just the expected return is modeled. The approach with categorical distributions as the approximation method is well-known in Q-learning, and convergence results have been established in the tabular case. In this work, speedy Q-learning is extended to categorical distributions, a finite-time analysis is performed, and probably approximately correct bounds in terms of the Crame ' r distance are established. It is shown that also in the distributional case the new update rule yields faster policy evaluation in comparison to the standard Q-learning one and that the sample complexity is essentially the same as the one of the value-based algorithmic counterpart. Without the need for more state -action-reward samples, one gains significantly more information about the return with categorical distributions. Even though the results do not easily extend to the case of policy control, a slight modification to the update rule yields promising numerical results.

引用

页码：675 / 693

页数：19

共 11 条

[1]

AZAR M. G., 2011, ADV NEURAL INFORM PR, P2411

[2]

Bellemare MG, 2017, PR MACH LEARN RES, V70

[3] DYNAMIC PROGRAMMING [J].

BELLMAN, R .

SCIENCE, 1966, 153 (3731) :34-&

[4]

Even-Dar E, 2003, J MACH LEARN RES, V5, P1

[5]

GHAVAMZADEH M., 2011, REINFORCEMENT LEARNI

[6]

Kallenberg O, 2017, PROB THEOR STOCH MOD, V77, P1, DOI 10.1007/978-3-319-41598-7

[7]

Lyle C, 2019, Arxiv, DOI arXiv:1901.11084

[8]

Rowland M., 2018, INT C ARTIFICIAL INT, P29

[9]

Sutton R. S., 1988, Machine Learning, V3, P9, DOI 10.1007/BF00115009

[10]

Sutton RS, 2018, ADAPT COMPUT MACH LE, P1

← 1 2 →