Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs

被引:0
|
作者
Tirinzoni, Andrea [1 ]
Al-Marjani, Aymen [2 ]
Kaufmann, Emilie [3 ]
机构
[1] Meta AI, Paris, France
[2] ENS Lyon, UMPA, Lyon, France
[3] Univ Lille, CNRS, INRIA, Cent Lille,UMR 9189,CRIStAL, Lille, France
关键词
MULTIARMED BANDIT;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In probably approximately correct (PAC) reinforcement learning (RL), an agent is required to identify an epsilon-optimal policy with probability 1 - delta. While minimax optimal algorithms exist for this problem, its instance-dependent complexity remains elusive in episodic Markov decision processes (MDPs). In this paper, we propose the first nearly matching (up to a horizon squared factor and logarithmic terms) upper and lower bounds on the sample complexity of PAC RL in deterministic episodic MDPs with finite state and action spaces. In particular, our bounds feature a new notion of sub-optimality gap for state-action pairs that we call the deterministic return gap. While our instance-dependent lower bound is written as a linear program, our algorithms are very simple and do not require solving such an optimization problem during learning. Their design and analyses employ novel ideas, including graph-theoretical concepts (minimum flows) and a new maximum-coverage exploration strategy.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Towards Instance-Optimal Offline Reinforcement Learning with Pessimism
    Yin, Ming
    Wang, Yu-Xiang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [2] Instance-optimal PAC Algorithms for Contextual Bandits
    Li, Zhaoqi
    Ratliff, Lillian
    Nassif, Houssam
    Jamieson, Kevin
    Jain, Lalit
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [3] Near-optimal Reinforcement Learning in Factored MDPs
    Osband, Ian
    Van Roy, Benjamin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [4] Reinforcement learning in finite MDPs: PAC analysis
    Strehl, Alexander L.
    Li, Hong
    Littman, Michael L.
    Journal of Machine Learning Research, 2009, 10 : 2413 - 2444
  • [5] Reinforcement Learning in Finite MDPs: PAC Analysis
    Strehl, Alexander L.
    Li, Lihong
    Littman, Michael L.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2009, 10 : 2413 - 2444
  • [6] The SMART Approach to Instance-Optimal Online Learning
    Banerjee, Siddhartha
    Bhatt, Alankrita
    Yu, Christina Lee
    THIRTY SEVENTH ANNUAL CONFERENCE ON LEARNING THEORY, 2023, 247
  • [7] Near-optimal PAC bounds for discounted MDPs
    Lattimore, Tor
    Hutter, Marcus
    THEORETICAL COMPUTER SCIENCE, 2014, 558 : 125 - 143
  • [8] Instance-Optimal Geometric Algorithms
    Afshani, Peyman
    Barbay, Jeremy
    Chan, Timothy M.
    JOURNAL OF THE ACM, 2017, 64 (01)
  • [9] Instance-Optimal Geometric Algorithms
    Afshani, Peyman
    Barbay, Jeremy
    Chan, Timothy M.
    2009 50TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE: FOCS 2009, PROCEEDINGS, 2009, : 129 - 138
  • [10] Data Amplification: Instance-Optimal Property Estimation
    Hao, Yi
    Orlitsky, Alon
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119