A Finite Sample Complexity Bound for Distributionally Robust Q-learning

被引:0
|
作者
Wang, Shengbo [1 ]
Si, Nian [2 ]
Blanchet, Jose [1 ]
Zhou, Zhengyuan [3 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] Univ Chicago, Chicago, IL USA
[3] NYU, New York, NY USA
基金
美国国家科学基金会;
关键词
OPTIMIZATION; UNCERTAINTY; GO;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider a reinforcement learning setting in which the deployment environment is different from the training environment. Applying a robust Markov decision processes formulation, we extend the distributionally robust Q-learning framework studied in Liu et al. (2022). Further, we improve the design and analysis of their multi-level Monte Carlo estimator. Assuming access to a simulator, we prove that the worst-case expected sample complexity of our algorithm to learn the optimal robust Q-function within an epsilon error in the sup norm is upper bounded by (O) over tilde(|S||A|(1 - gamma)(-5)epsilon(-2)p((sic))(-6)delta(-4)), where gamma is the discount rate, p((sic)) is the non-zero minimal support probability of the transition kernels and delta is the uncertainty size. This is the first sample complexity result for the model-free robust RL problem. Simulation studies further validate our theoretical results.
引用
收藏
页数:29
相关论文
共 50 条
  • [1] Sample Complexity of Variance-Reduced Distributionally Robust Q-Learning
    Wang, Shengbo
    Si, Nian
    Blanchet, Jose
    Zhou, Zhengyuan
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [2] Distributionally Robust Q-Learning
    Liu, Zijian
    Bai, Qinxun
    Blanchet, Jose
    Dong, Perry
    Xu, Wei
    Zhou, Zhengqing
    Zhou, Zhengyuan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [3] Finite-Sample Regret Bound for Distributionally Robust Offline Tabular Reinforcement Learning
    Zhou, Zhengqing
    Zhou, Zhengyuan
    Bai, Qinxun
    Qiu, Linhai
    Blanchet, Jose
    Glynn, Peter
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [4] Path Planning Using Wasserstein Distributionally Robust Deep Q-learning
    Alpturk, Cem
    Renganathan, Venkatraman
    2023 EUROPEAN CONTROL CONFERENCE, ECC, 2023,
  • [5] Sample Complexity of Kernel-Based Q-Learning
    Yeh, Sing-Yuan
    Chang, Fu-Chieh
    Yueh, Chang-Wei
    Wu, Pei-Yuan
    Bernacchia, Alberto
    Vakili, Sattar
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206 : 453 - 469
  • [6] Tightening the Dependence on Horizon in the Sample Complexity of Q-Learning
    Li, Gen
    Cai, Changxiao
    Chen, Yuxin
    Gu, Yuantao
    Wei, Yuting
    Chi, Yuejie
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [7] The Sample Complexity of Teaching-by-Reinforcement on Q-Learning
    Zhang, Xuezhou
    Bharti, Shubham Kumar
    Ma, Yuzhe
    Singla, Adish
    Zhu, Xiaojin
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 10939 - 10947
  • [8] Sample Complexity for Distributionally Robust Learning under χ2-divergence
    Zhou, Zhengyu
    Liu, Weiwei
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [9] Is Q-Learning Minimax Optimal? A Tight Sample Complexity Analysis
    Li, Gen
    Cai, Changxiao
    Chen, Yuxin
    Wei, Yuting
    Chi, Yuejie
    OPERATIONS RESEARCH, 2024, 72 (01) : 222 - 236
  • [10] Sample Complexity of Decentralized Tabular Q-Learning for Stochastic Games
    Gao, Zuguang
    Ma, Qianqian
    Basar, Tamer
    Birge, John R.
    2023 AMERICAN CONTROL CONFERENCE, ACC, 2023, : 1098 - 1103