Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs

被引:0
作者
Simchowitz, Max [1 ]
Jamieson, Kevin [2 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] Univ Washington, Seattle, WA 98195 USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019) | 2019年 / 32卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper establishes that optimistic algorithms attain gap-dependent and non-asymptotic logarithmic regret for episodic MDPs. In contrast to prior work, our bounds do not suffer a dependence on diameter-like quantities or ergodicity, and smoothly interpolate between the gap dependent logarithmic-regret, and the O(root HSAT)-minimax rate. The key technique in our analysis is a novel "clipped" regret decomposition which applies to a broad family of recent optimistic algorithms for episodic MDPs.
引用
收藏
页数:10
相关论文
共 14 条
[1]  
Azar MG, 2017, PR MACH LEARN RES, V70
[2]   Integration Strategies for Computational Science & Engineering Software [J].
Bartlett, Roscoe A. .
2009 ICSE WORKSHOP ON SOFTWARE ENGINEERING FOR COMPUTATIONAL SCIENCE AND ENGINEERING, 2009, :35-42
[3]   Optimal adaptive policies for Markov decision processes [J].
Burnetas, AN ;
Katehakis, MN .
MATHEMATICS OF OPERATIONS RESEARCH, 1997, 22 (01) :222-255
[4]  
Dann C., 2018, ARXIV181103056
[5]  
Dann C, 2017, ADV NEUR IN, V30
[6]  
Dann Christoph, 2015, Advances in Neural Information Processing Systems, P2818
[7]  
Garivier Aurelien, 2018, MATH OPERATIONS RES
[8]  
Jaksch T, 2010, J MACH LEARN RES, V11, P1563
[9]  
Jin C., 2018, ADV NEURAL INFORM PR, P4863
[10]  
Ok Jungseul, 2018, Advances in Neural Information Processing Systems, V31, P8888