A Finite Sample Complexity Bound for Distributionally Robust Q-learning

被引：0

作者：

Wang, Shengbo ^{[1
]}

Si, Nian ^{[2
]}

Blanchet, Jose ^{[1
]}

Zhou, Zhengyuan ^{[3
]}

机构：

[1] Stanford Univ, Stanford, CA 94305 USA

[2] Univ Chicago, Chicago, IL USA

[3] NYU, New York, NY USA

来源：

INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206 | 2023年 / 206卷

基金：

美国国家科学基金会;

关键词：

OPTIMIZATION; UNCERTAINTY; GO;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider a reinforcement learning setting in which the deployment environment is different from the training environment. Applying a robust Markov decision processes formulation, we extend the distributionally robust Q-learning framework studied in Liu et al. (2022). Further, we improve the design and analysis of their multi-level Monte Carlo estimator. Assuming access to a simulator, we prove that the worst-case expected sample complexity of our algorithm to learn the optimal robust Q-function within an epsilon error in the sup norm is upper bounded by (O) over tilde(|S||A|(1 - gamma)(-5)epsilon(-2)p((sic))(-6)delta(-4)), where gamma is the discount rate, p((sic)) is the non-zero minimal support probability of the transition kernels and delta is the uncertainty size. This is the first sample complexity result for the model-free robust RL problem. Simulation studies further validate our theoretical results.

引用

页数：29

共 50 条

[21] Robust diagnostic classification via Q-learning
Victor Ardulov
Victor R. Martinez
Krishna Somandepalli
Shuting Zheng
Emma Salzman
Catherine Lord
Somer Bishop
Shrikanth Narayanan
Scientific Reports, 11
[22] Robust diagnostic classification via Q-learning
Ardulov, Victor
Martinez, Victor R.
Somandepalli, Krishna
Zheng, Shuting
Salzman, Emma
Lord, Catherine
Bishop, Somer
Narayanan, Shrikanth
SCIENTIFIC REPORTS, 2021, 11 (01)
[23] On-Off Adversarially Robust Q-Learning
Sahoo, Prachi Pratyusha
Vamvoudakis, Kyriakos G.
IEEE CONTROL SYSTEMS LETTERS, 2020, 4 (03): : 749 - 754
[24] Split Deep Q-Learning for Robust Object Singulation
Sarantopoulos, Iason
Kiatos, Marios
Doulgeri, Zoe
Malassiotis, Sotiris
2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 6225 - 6231
[25] Convergence and bound computation for chance constrained distributionally robust models using sample approximation
Lei, Jiaqi
Mehrotra, Sanjay
OPERATIONS RESEARCH LETTERS, 2025, 60
[26] Enhanced Dynamic Expansion Planning Model Incorporating Q-Learning and Distributionally Robust Optimization for Resilient and Cost-Efficient Distribution Networks
Lu, Gang
Yuan, Bo
Nie, Baorui
Xia, Peng
Wu, Cong
Sun, Guangzeng
ENERGIES, 2025, 18 (05)
[27] Q-LEARNING
WATKINS, CJCH
DAYAN, P
MACHINE LEARNING, 1992, 8 (3-4) : 279 - 292
[28] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
Tan, Fuxiao
Yan, Pengfei
Guan, Xinping
NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
[29] Self-Imitation Learning via Generalized Lower Bound Q-learning
Tang, Yunhao
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[30] Backward Q-learning: The combination of Sarsa algorithm and Q-learning
Wang, Yin-Hao
Li, Tzuu-Hseng S.
Lin, Chih-Jui
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (09) : 2184 - 2193

← 1 2 3 4 5 →