Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs

被引：0

作者：

Tirinzoni, Andrea ^{[1
]}

Al-Marjani, Aymen ^{[2
]}

Kaufmann, Emilie ^{[3
]}

机构：

[1] Meta AI, Paris, France

[2] ENS Lyon, UMPA, Lyon, France

[3] Univ Lille, CNRS, INRIA, Cent Lille,UMR 9189,CRIStAL, Lille, France

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022 | 2022年

关键词：

MULTIARMED BANDIT;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In probably approximately correct (PAC) reinforcement learning (RL), an agent is required to identify an epsilon-optimal policy with probability 1 - delta. While minimax optimal algorithms exist for this problem, its instance-dependent complexity remains elusive in episodic Markov decision processes (MDPs). In this paper, we propose the first nearly matching (up to a horizon squared factor and logarithmic terms) upper and lower bounds on the sample complexity of PAC RL in deterministic episodic MDPs with finite state and action spaces. In particular, our bounds feature a new notion of sub-optimality gap for state-action pairs that we call the deterministic return gap. While our instance-dependent lower bound is written as a linear program, our algorithms are very simple and do not require solving such an optimization problem during learning. Their design and analyses employ novel ideas, including graph-theoretical concepts (minimum flows) and a new maximum-coverage exploration strategy.

引用

页数：14

共 50 条

[1] Towards Instance-Optimal Offline Reinforcement Learning with Pessimism
Yin, Ming
Wang, Yu-Xiang
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
[2] Instance-optimal PAC Algorithms for Contextual Bandits
Li, Zhaoqi
Ratliff, Lillian
Nassif, Houssam
Jamieson, Kevin
Jain, Lalit
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[3] Near-optimal Reinforcement Learning in Factored MDPs
Osband, Ian
Van Roy, Benjamin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
[4] Reinforcement learning in finite MDPs: PAC analysis
Strehl, Alexander L.
Li, Hong
Littman, Michael L.
Journal of Machine Learning Research, 2009, 10 : 2413 - 2444
[5] Reinforcement Learning in Finite MDPs: PAC Analysis
Strehl, Alexander L.
Li, Lihong
Littman, Michael L.
JOURNAL OF MACHINE LEARNING RESEARCH, 2009, 10 : 2413 - 2444
[6] The SMART Approach to Instance-Optimal Online Learning
Banerjee, Siddhartha
Bhatt, Alankrita
Yu, Christina Lee
THIRTY SEVENTH ANNUAL CONFERENCE ON LEARNING THEORY, 2023, 247
[7] Near-optimal PAC bounds for discounted MDPs
Lattimore, Tor
Hutter, Marcus
THEORETICAL COMPUTER SCIENCE, 2014, 558 : 125 - 143
[8] Instance-Optimal Geometric Algorithms
Afshani, Peyman
Barbay, Jeremy
Chan, Timothy M.
JOURNAL OF THE ACM, 2017, 64 (01)
[9] Instance-Optimal Geometric Algorithms
Afshani, Peyman
Barbay, Jeremy
Chan, Timothy M.
2009 50TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE: FOCS 2009, PROCEEDINGS, 2009, : 129 - 138
[10] Data Amplification: Instance-Optimal Property Estimation
Hao, Yi
Orlitsky, Alon
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119

← 1 2 3 4 5 →