A Finite Sample Complexity Bound for Distributionally Robust Q-learning

被引:0
|
作者
Wang, Shengbo [1 ]
Si, Nian [2 ]
Blanchet, Jose [1 ]
Zhou, Zhengyuan [3 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] Univ Chicago, Chicago, IL USA
[3] NYU, New York, NY USA
基金
美国国家科学基金会;
关键词
OPTIMIZATION; UNCERTAINTY; GO;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider a reinforcement learning setting in which the deployment environment is different from the training environment. Applying a robust Markov decision processes formulation, we extend the distributionally robust Q-learning framework studied in Liu et al. (2022). Further, we improve the design and analysis of their multi-level Monte Carlo estimator. Assuming access to a simulator, we prove that the worst-case expected sample complexity of our algorithm to learn the optimal robust Q-function within an epsilon error in the sup norm is upper bounded by (O) over tilde(|S||A|(1 - gamma)(-5)epsilon(-2)p((sic))(-6)delta(-4)), where gamma is the discount rate, p((sic)) is the non-zero minimal support probability of the transition kernels and delta is the uncertainty size. This is the first sample complexity result for the model-free robust RL problem. Simulation studies further validate our theoretical results.
引用
收藏
页数:29
相关论文
共 50 条
  • [41] Learning rates for Q-learning
    Even-Dar, E
    Mansour, Y
    JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 5 : 1 - 25
  • [42] Sample Complexity of Robust Reinforcement Learning with a Generative Model
    Panaganti, Kishan
    Kalathil, Dileep
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [43] Contextual Q-Learning
    Pinto, Tiago
    Vale, Zita
    ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 2927 - 2928
  • [44] Bayesian Q-learning
    Dearden, R
    Friedman, N
    Russell, S
    FIFTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-98) AND TENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICAL INTELLIGENCE (IAAI-98) - PROCEEDINGS, 1998, : 761 - 768
  • [45] CVaR Q-Learning
    Stanko, Silvestr
    Macek, Karel
    COMPUTATIONAL INTELLIGENCE: 11th International Joint Conference, IJCCI 2019, Vienna, Austria, September 17-19, 2019, Revised Selected Papers, 2021, 922 : 333 - 358
  • [46] Zap Q-Learning
    Devraj, Adithya M.
    Meyn, Sean P.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [47] Convex Q-Learning
    Lu, Fan
    Mehta, Prashant G.
    Meyn, Sean P.
    Neu, Gergely
    2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 4749 - 4756
  • [48] Fuzzy Q-learning
    Glorennec, PY
    Jouffe, L
    PROCEEDINGS OF THE SIXTH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS I - III, 1997, : 659 - 662
  • [49] Q-learning and robotics
    Touzet, CF
    Santos, JM
    SIMULATION IN INDUSTRY 2001, 2001, : 685 - 689
  • [50] Q-learning automaton
    Qian, F
    Hirata, H
    IEEE/WIC INTERNATIONAL CONFERENCE ON INTELLIGENT AGENT TECHNOLOGY, PROCEEDINGS, 2003, : 432 - 437