Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs

被引：0

作者：

Simchowitz, Max ^{[1
]}

Jamieson, Kevin ^{[2
]}

机构：

[1] Univ Calif Berkeley, Berkeley, CA 94720 USA

[2] Univ Washington, Seattle, WA 98195 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019) | 2019年 / 32卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper establishes that optimistic algorithms attain gap-dependent and non-asymptotic logarithmic regret for episodic MDPs. In contrast to prior work, our bounds do not suffer a dependence on diameter-like quantities or ergodicity, and smoothly interpolate between the gap dependent logarithmic-regret, and the O(root HSAT)-minimax rate. The key technique in our analysis is a novel "clipped" regret decomposition which applies to a broad family of recent optimistic algorithms for episodic MDPs.

引用

页数：10

共 14 条

[1]

Azar MG, 2017, PR MACH LEARN RES, V70

[2] Integration Strategies for Computational Science & Engineering Software [J].

Bartlett, Roscoe A. .

2009 ICSE WORKSHOP ON SOFTWARE ENGINEERING FOR COMPUTATIONAL SCIENCE AND ENGINEERING, 2009, :35-42

[3] Optimal adaptive policies for Markov decision processes [J].

Burnetas, AN ;

Katehakis, MN .

MATHEMATICS OF OPERATIONS RESEARCH, 1997, 22 (01) :222-255

[4]

Dann C., 2018, ARXIV181103056

[5]

Dann C, 2017, ADV NEUR IN, V30

[6]

Dann Christoph, 2015, Advances in Neural Information Processing Systems, P2818

[7]

Garivier Aurelien, 2018, MATH OPERATIONS RES

[8]

Jaksch T, 2010, J MACH LEARN RES, V11, P1563

[9]

Jin C., 2018, ADV NEURAL INFORM PR, P4863

[10]

Ok Jungseul, 2018, Advances in Neural Information Processing Systems, V31, P8888

← 1 2 →