Speedy Categorical Distributional Reinforcement Learning and Complexity Analysis

被引:0
作者
Boeck, Markus [1 ]
Heitzinger, Clemens [1 ]
机构
[1] TU Wien, Dept Math & Geoinformat, Vienna, Austria
来源
SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE | 2022年 / 4卷 / 02期
关键词
reinforcement learning; distributional reinforcement learning; Q-learning; PAC bounds; complexity analysis;
D O I
10.1137/20M1364436
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In distributional reinforcement learning, the entire distribution of the return instead of just the expected return is modeled. The approach with categorical distributions as the approximation method is well-known in Q-learning, and convergence results have been established in the tabular case. In this work, speedy Q-learning is extended to categorical distributions, a finite-time analysis is performed, and probably approximately correct bounds in terms of the Crame ' r distance are established. It is shown that also in the distributional case the new update rule yields faster policy evaluation in comparison to the standard Q-learning one and that the sample complexity is essentially the same as the one of the value-based algorithmic counterpart. Without the need for more state -action-reward samples, one gains significantly more information about the return with categorical distributions. Even though the results do not easily extend to the case of policy control, a slight modification to the update rule yields promising numerical results.
引用
收藏
页码:675 / 693
页数:19
相关论文
共 11 条
[1]  
AZAR M. G., 2011, ADV NEURAL INFORM PR, P2411
[2]  
Bellemare MG, 2017, PR MACH LEARN RES, V70
[3]   DYNAMIC PROGRAMMING [J].
BELLMAN, R .
SCIENCE, 1966, 153 (3731) :34-&
[4]  
Even-Dar E, 2003, J MACH LEARN RES, V5, P1
[5]  
GHAVAMZADEH M., 2011, REINFORCEMENT LEARNI
[6]  
Kallenberg O, 2017, PROB THEOR STOCH MOD, V77, P1, DOI 10.1007/978-3-319-41598-7
[7]  
Lyle C, 2019, Arxiv, DOI arXiv:1901.11084
[8]  
Rowland M., 2018, INT C ARTIFICIAL INT, P29
[9]  
Sutton R. S., 1988, Machine Learning, V3, P9, DOI 10.1007/BF00115009
[10]  
Sutton RS, 2018, ADAPT COMPUT MACH LE, P1