Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs

被引:0
|
作者
Tirinzoni, Andrea [1 ]
Al-Marjani, Aymen [2 ]
Kaufmann, Emilie [3 ]
机构
[1] Meta AI, Paris, France
[2] ENS Lyon, UMPA, Lyon, France
[3] Univ Lille, CNRS, INRIA, Cent Lille,UMR 9189,CRIStAL, Lille, France
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022 | 2022年
关键词
MULTIARMED BANDIT;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In probably approximately correct (PAC) reinforcement learning (RL), an agent is required to identify an epsilon-optimal policy with probability 1 - delta. While minimax optimal algorithms exist for this problem, its instance-dependent complexity remains elusive in episodic Markov decision processes (MDPs). In this paper, we propose the first nearly matching (up to a horizon squared factor and logarithmic terms) upper and lower bounds on the sample complexity of PAC RL in deterministic episodic MDPs with finite state and action spaces. In particular, our bounds feature a new notion of sub-optimality gap for state-action pairs that we call the deterministic return gap. While our instance-dependent lower bound is written as a linear program, our algorithms are very simple and do not require solving such an optimization problem during learning. Their design and analyses employ novel ideas, including graph-theoretical concepts (minimum flows) and a new maximum-coverage exploration strategy.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Robust instance-optimal recovery of sparse signals at unknown noise levels
    Petersen, Hendrik Bernd
    Jung, Peter
    INFORMATION AND INFERENCE-A JOURNAL OF THE IMA, 2022, 11 (03) : 845 - 887
  • [32] Instance-Dependent Near-Optimal Policy Identification in Linear MDPs via Online Experiment Design
    Wagenmaker, Andrew
    Jamieson, Kevin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [33] Knowledge Revision for Reinforcement Learning with Abstract MDPs
    Efthymiadis, Kyriakos
    Kudenko, Daniel
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS (AAMAS'15), 2015, : 763 - 770
  • [34] Reinforcement Learning in Parametric MDPs with Exponential Families
    Chowdhury, Sayak Ray
    Gopalan, Aditya
    Maillard, Odalric-Ambrym
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [35] Near-Optimal Interdiction of Factored MDPs
    Panda, Swetasudha
    Vorobeychik, Yevgeniy
    CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI2017), 2017,
  • [36] Knowledge Revision for Reinforcement Learning with Abstract MDPs
    Efthymiadis, Kyriakos
    Devlin, Sam
    Kudenko, Daniel
    AAMAS'14: PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, 2014, : 1535 - 1536
  • [37] Reinforcement Learning in Reward-Mixing MDPs
    Kwon, Jeongyeol
    Efroni, Yonathan
    Caramanis, Constantine
    Mannor, Shie
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [38] TeXDYNA: Hierarchical Reinforcement Learning in Factored MDPs
    Kozlova, Olga
    Sigaud, Olivier
    Meyer, Christophe
    FROM ANIMALS TO ANIMATS 11, 2010, 6226 : 489 - +
  • [39] Safety-Constrained Reinforcement Learning for MDPs
    Junges, Sebastian
    Jansen, Nils
    Dehnert, Christian
    Topcu, Ufuk
    Katoen, Joost-Pieter
    TOOLS AND ALGORITHMS FOR THE CONSTRUCTION AND ANALYSIS OF SYSTEMS (TACAS 2016), 2016, 9636 : 130 - 146
  • [40] Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
    Sutton, RS
    Precup, D
    Singh, S
    ARTIFICIAL INTELLIGENCE, 1999, 112 (1-2) : 181 - 211