Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning

被引:0
作者
Zhang, Zihan [1 ]
Jiang, Yuhang [1 ]
Zhou, Yuan [2 ,3 ]
Ji, Xiangyang [1 ]
机构
[1] Tsinghua Univ, Dept Automat, Beijing, Peoples R China
[2] Tsinghua Univ, Yau Math Sci Ctr, Beijing, Peoples R China
[3] Tsinghua Univ, Dept Math Sci, Beijing, Peoples R China
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022 | 2022年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we study the episodic reinforcement learning (RL) problem modeled by finite-horizon Markov Decision Processes (MDPs) with constraint on the number of batches. The multi-batch reinforcement learning framework, where the agent is required to provide a time schedule to update policy before everything, which is particularly suitable for the scenarios where the agent suffers extensively from changing the policy adaptively. Given a finite-horizon MDP with S states, A actions and planning horizon H, we design a computational efficient algorithm to achieve near-optimal regret of (O) over tilde (root SAH(3)K ln(1/delta))(5) in K episodes using O (H + log(2) log(2)(K)) batches with confidence parameter delta. To our best of knowledge, it is the first (O) over tilde ( root SAH(3)K) regret bound with O(H + log(2) log(2)(K)) batch complexity. Meanwhile, we show that to achieve (O) over tilde (poly(S, A, H)root K) regret, the number of batches is at least Omega(H/log(A)(K) + log(2) log(2)(K)), which matches our upper bound up to logarithmic terms. Our technical contribution are two-fold: 1) a near-optimal design scheme to explore over the unlearned states; 2) an computational efficient algorithm to explore certain directions with an approximated transition model.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Optimal Regret Bounds for Collaborative Learning in Bandits
    Shidani, Amitis
    Vakili, Sattar
    INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 237, 2024, 237
  • [42] A Biased Graph Neural Network Sampler with Near-Optimal Regret
    Zhang, Qingru
    Wipf, David
    Gan, Quan
    Song, Le
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [43] A Bayesian reinforcement learning approach in markov games for computing near-optimal policies
    Julio B. Clempner
    Annals of Mathematics and Artificial Intelligence, 2023, 91 : 675 - 690
  • [44] A Bayesian reinforcement learning approach in markov games for computing near-optimal policies
    Clempner, Julio B.
    ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE, 2023, 91 (05) : 675 - 690
  • [45] Near-optimal reinforcement learning framework for energy-aware sensor communications
    Pandana, C
    Liu, KJR
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2005, 23 (04) : 788 - 797
  • [46] Near-Optimal Vehicular Crowdsensing Task Allocation Empowered by Deep Reinforcement Learning
    Xiang C.-C.
    Li Y.-Y.
    Feng L.
    Chen C.
    Guo S.-T.
    Yang P.-L.
    Jisuanji Xuebao/Chinese Journal of Computers, 2022, 45 (05): : 918 - 934
  • [47] Near-Optimal Provable Uniform Convergence in Offine Policy Evaluation for Reinforcement Learning
    Yin, Ming
    Bai, Yu
    Wang, Yu-Xiang
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [48] Near-Optimal Sample Complexity Bounds for Constrained MDPs
    Vaswani, Sharan
    Yang, Lin F.
    Szepesvari, Csaba
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [49] Near-Optimal Complexity Bounds for Fragments of the Skolem Problem
    Akshay, S.
    Balaji, Nikhil
    Murhekar, Aniket
    Varma, Rohith
    Vyas, Nikhil
    37TH INTERNATIONAL SYMPOSIUM ON THEORETICAL ASPECTS OF COMPUTER SCIENCE (STACS 2020), 2020, 154
  • [50] Near-optimal lower bounds on the multi-party communication complexity of set disjointness
    Chakrabarti, A
    Khot, S
    Sun, XD
    18TH IEEE ANNUAL CONFERENCE ON COMPUTATIONAL COMPLEXITY, PROCEEDINGS, 2003, : 107 - 117