Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs

被引:0
作者
Simchowitz, Max [1 ]
Jamieson, Kevin [2 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] Univ Washington, Seattle, WA 98195 USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019) | 2019年 / 32卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper establishes that optimistic algorithms attain gap-dependent and non-asymptotic logarithmic regret for episodic MDPs. In contrast to prior work, our bounds do not suffer a dependence on diameter-like quantities or ergodicity, and smoothly interpolate between the gap dependent logarithmic-regret, and the O(root HSAT)-minimax rate. The key technique in our analysis is a novel "clipped" regret decomposition which applies to a broad family of recent optimistic algorithms for episodic MDPs.
引用
收藏
页数:10
相关论文
共 14 条
[11]  
Osband I., 2016, On lower bounds for regret in reinforcement learning
[12]  
Tewari A., 2008, Advances in Neural Information Processing Systems, P1505
[13]  
Tewari Ambuj, 2007, Reinforcement learning in large or unknown MDPs
[14]  
Zanette A., 2019, ARXIV190100210