Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs

被引：0

作者：

Tirinzoni, Andrea ^{[1
]}

Al-Marjani, Aymen ^{[2
]}

Kaufmann, Emilie ^{[3
]}

机构：

[1] Meta AI, Paris, France

[2] ENS Lyon, UMPA, Lyon, France

[3] Univ Lille, CNRS, INRIA, Cent Lille,UMR 9189,CRIStAL, Lille, France

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022 | 2022年

关键词：

MULTIARMED BANDIT;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In probably approximately correct (PAC) reinforcement learning (RL), an agent is required to identify an epsilon-optimal policy with probability 1 - delta. While minimax optimal algorithms exist for this problem, its instance-dependent complexity remains elusive in episodic Markov decision processes (MDPs). In this paper, we propose the first nearly matching (up to a horizon squared factor and logarithmic terms) upper and lower bounds on the sample complexity of PAC RL in deterministic episodic MDPs with finite state and action spaces. In particular, our bounds feature a new notion of sub-optimality gap for state-action pairs that we call the deterministic return gap. While our instance-dependent lower bound is written as a linear program, our algorithms are very simple and do not require solving such an optimization problem during learning. Their design and analyses employ novel ideas, including graph-theoretical concepts (minimum flows) and a new maximum-coverage exploration strategy.

引用

页数：14

共 50 条

[31] Robust instance-optimal recovery of sparse signals at unknown noise levels
Petersen, Hendrik Bernd
Jung, Peter
INFORMATION AND INFERENCE-A JOURNAL OF THE IMA, 2022, 11 (03) : 845 - 887
[32] Instance-Dependent Near-Optimal Policy Identification in Linear MDPs via Online Experiment Design
Wagenmaker, Andrew
Jamieson, Kevin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[33] Knowledge Revision for Reinforcement Learning with Abstract MDPs
Efthymiadis, Kyriakos
Kudenko, Daniel
PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS (AAMAS'15), 2015, : 763 - 770
[34] Reinforcement Learning in Parametric MDPs with Exponential Families
Chowdhury, Sayak Ray
Gopalan, Aditya
Maillard, Odalric-Ambrym
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[35] Near-Optimal Interdiction of Factored MDPs
Panda, Swetasudha
Vorobeychik, Yevgeniy
CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI2017), 2017,
[36] Knowledge Revision for Reinforcement Learning with Abstract MDPs
Efthymiadis, Kyriakos
Devlin, Sam
Kudenko, Daniel
AAMAS'14: PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, 2014, : 1535 - 1536
[37] Reinforcement Learning in Reward-Mixing MDPs
Kwon, Jeongyeol
Efroni, Yonathan
Caramanis, Constantine
Mannor, Shie
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[38] TeXDYNA: Hierarchical Reinforcement Learning in Factored MDPs
Kozlova, Olga
Sigaud, Olivier
Meyer, Christophe
FROM ANIMALS TO ANIMATS 11, 2010, 6226 : 489 - +
[39] Safety-Constrained Reinforcement Learning for MDPs
Junges, Sebastian
Jansen, Nils
Dehnert, Christian
Topcu, Ufuk
Katoen, Joost-Pieter
TOOLS AND ALGORITHMS FOR THE CONSTRUCTION AND ANALYSIS OF SYSTEMS (TACAS 2016), 2016, 9636 : 130 - 146
[40] Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
Sutton, RS
Precup, D
Singh, S
ARTIFICIAL INTELLIGENCE, 1999, 112 (1-2) : 181 - 211

← 1 2 3 4 5 →