A Finite Sample Complexity Bound for Distributionally Robust Q-learning

被引:0
|
作者
Wang, Shengbo [1 ]
Si, Nian [2 ]
Blanchet, Jose [1 ]
Zhou, Zhengyuan [3 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] Univ Chicago, Chicago, IL USA
[3] NYU, New York, NY USA
基金
美国国家科学基金会;
关键词
OPTIMIZATION; UNCERTAINTY; GO;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider a reinforcement learning setting in which the deployment environment is different from the training environment. Applying a robust Markov decision processes formulation, we extend the distributionally robust Q-learning framework studied in Liu et al. (2022). Further, we improve the design and analysis of their multi-level Monte Carlo estimator. Assuming access to a simulator, we prove that the worst-case expected sample complexity of our algorithm to learn the optimal robust Q-function within an epsilon error in the sup norm is upper bounded by (O) over tilde(|S||A|(1 - gamma)(-5)epsilon(-2)p((sic))(-6)delta(-4)), where gamma is the discount rate, p((sic)) is the non-zero minimal support probability of the transition kernels and delta is the uncertainty size. This is the first sample complexity result for the model-free robust RL problem. Simulation studies further validate our theoretical results.
引用
收藏
页数:29
相关论文
共 50 条
  • [21] Robust diagnostic classification via Q-learning
    Victor Ardulov
    Victor R. Martinez
    Krishna Somandepalli
    Shuting Zheng
    Emma Salzman
    Catherine Lord
    Somer Bishop
    Shrikanth Narayanan
    Scientific Reports, 11
  • [22] Robust diagnostic classification via Q-learning
    Ardulov, Victor
    Martinez, Victor R.
    Somandepalli, Krishna
    Zheng, Shuting
    Salzman, Emma
    Lord, Catherine
    Bishop, Somer
    Narayanan, Shrikanth
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [23] On-Off Adversarially Robust Q-Learning
    Sahoo, Prachi Pratyusha
    Vamvoudakis, Kyriakos G.
    IEEE CONTROL SYSTEMS LETTERS, 2020, 4 (03): : 749 - 754
  • [24] Split Deep Q-Learning for Robust Object Singulation
    Sarantopoulos, Iason
    Kiatos, Marios
    Doulgeri, Zoe
    Malassiotis, Sotiris
    2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 6225 - 6231
  • [25] Convergence and bound computation for chance constrained distributionally robust models using sample approximation
    Lei, Jiaqi
    Mehrotra, Sanjay
    OPERATIONS RESEARCH LETTERS, 2025, 60
  • [26] Enhanced Dynamic Expansion Planning Model Incorporating Q-Learning and Distributionally Robust Optimization for Resilient and Cost-Efficient Distribution Networks
    Lu, Gang
    Yuan, Bo
    Nie, Baorui
    Xia, Peng
    Wu, Cong
    Sun, Guangzeng
    ENERGIES, 2025, 18 (05)
  • [27] Q-LEARNING
    WATKINS, CJCH
    DAYAN, P
    MACHINE LEARNING, 1992, 8 (3-4) : 279 - 292
  • [28] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
    Tan, Fuxiao
    Yan, Pengfei
    Guan, Xinping
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
  • [29] Self-Imitation Learning via Generalized Lower Bound Q-learning
    Tang, Yunhao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [30] Backward Q-learning: The combination of Sarsa algorithm and Q-learning
    Wang, Yin-Hao
    Li, Tzuu-Hseng S.
    Lin, Chih-Jui
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (09) : 2184 - 2193