A Finite Sample Complexity Bound for Distributionally Robust Q-learning

被引:0
|
作者
Wang, Shengbo [1 ]
Si, Nian [2 ]
Blanchet, Jose [1 ]
Zhou, Zhengyuan [3 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] Univ Chicago, Chicago, IL USA
[3] NYU, New York, NY USA
基金
美国国家科学基金会;
关键词
OPTIMIZATION; UNCERTAINTY; GO;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider a reinforcement learning setting in which the deployment environment is different from the training environment. Applying a robust Markov decision processes formulation, we extend the distributionally robust Q-learning framework studied in Liu et al. (2022). Further, we improve the design and analysis of their multi-level Monte Carlo estimator. Assuming access to a simulator, we prove that the worst-case expected sample complexity of our algorithm to learn the optimal robust Q-function within an epsilon error in the sup norm is upper bounded by (O) over tilde(|S||A|(1 - gamma)(-5)epsilon(-2)p((sic))(-6)delta(-4)), where gamma is the discount rate, p((sic)) is the non-zero minimal support probability of the transition kernels and delta is the uncertainty size. This is the first sample complexity result for the model-free robust RL problem. Simulation studies further validate our theoretical results.
引用
收藏
页数:29
相关论文
共 50 条
  • [31] A Lower Bound for the Sample Complexity of Inverse Reinforcement Learning
    Komanduru, Abi
    Honorio, Jean
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [32] Distributionally Robust Imitation Learning
    Bashiri, Mohammad Ali
    Ziebart, Brian D.
    Zhang, Xinhua
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [33] Q-Learning for Robust Satisfaction of Signal Temporal Logic Specifications
    Aksaray, Derya
    Jones, Austin
    Kong, Zhaodan
    Schwager, Mac
    Belta, Cahn
    2016 IEEE 55TH CONFERENCE ON DECISION AND CONTROL (CDC), 2016, : 6565 - 6570
  • [34] Robust flipping stabilization of Boolean networks: A Q-learning approach
    Liu, Zejiao
    Liu, Yang
    Ruan, Qihua
    Gui, Weihua
    SYSTEMS & CONTROL LETTERS, 2023, 176
  • [35] Finite-Sample Guarantees for Wasserstein Distributionally Robust Optimization: Breaking the Curse of Dimensionality
    Gao, Rui
    OPERATIONS RESEARCH, 2023, 71 (06) : 2291 - 2306
  • [36] Making Deep Q-learning Methods Robust to Time Discretization
    Tallec, Corentin
    Blier, Leonard
    Ollivier, Yann
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [37] Q-learning with continuous state spaces and finite decision set
    Barty, Kengy
    Girardeau, Pierre
    Roy, Jean-Sebastien
    Strugarek, Cyrille
    2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, : 346 - +
  • [38] Final Iteration Convergence Bound of Q-Learning: Switching System Approach
    Lee, Donghwan
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (07) : 4765 - 4772
  • [39] Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning
    Ohnishi, Shota
    Uchibe, Eiji
    Yamaguchi, Yotaro
    Nakanishi, Kosuke
    Yasui, Yuji
    Ishii, Shin
    FRONTIERS IN NEUROROBOTICS, 2019, 13
  • [40] Learning rates for Q-Learning
    Even-Dar, E
    Mansour, Y
    COMPUTATIONAL LEARNING THEORY, PROCEEDINGS, 2001, 2111 : 589 - 604