A Finite Sample Complexity Bound for Distributionally Robust Q-learning

被引：0

作者：

Wang, Shengbo ^{[1
]}

Si, Nian ^{[2
]}

Blanchet, Jose ^{[1
]}

Zhou, Zhengyuan ^{[3
]}

机构：

[1] Stanford Univ, Stanford, CA 94305 USA

[2] Univ Chicago, Chicago, IL USA

[3] NYU, New York, NY USA

来源：

INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206 | 2023年 / 206卷

基金：

美国国家科学基金会;

关键词：

OPTIMIZATION; UNCERTAINTY; GO;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider a reinforcement learning setting in which the deployment environment is different from the training environment. Applying a robust Markov decision processes formulation, we extend the distributionally robust Q-learning framework studied in Liu et al. (2022). Further, we improve the design and analysis of their multi-level Monte Carlo estimator. Assuming access to a simulator, we prove that the worst-case expected sample complexity of our algorithm to learn the optimal robust Q-function within an epsilon error in the sup norm is upper bounded by (O) over tilde(|S||A|(1 - gamma)(-5)epsilon(-2)p((sic))(-6)delta(-4)), where gamma is the discount rate, p((sic)) is the non-zero minimal support probability of the transition kernels and delta is the uncertainty size. This is the first sample complexity result for the model-free robust RL problem. Simulation studies further validate our theoretical results.

引用

页数：29

共 50 条

[1] Sample Complexity of Variance-Reduced Distributionally Robust Q-Learning
Wang, Shengbo
Si, Nian
Blanchet, Jose
Zhou, Zhengyuan
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
[2] Distributionally Robust Q-Learning
Liu, Zijian
Bai, Qinxun
Blanchet, Jose
Dong, Perry
Xu, Wei
Zhou, Zhengqing
Zhou, Zhengyuan
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[3] Finite-Sample Regret Bound for Distributionally Robust Offline Tabular Reinforcement Learning
Zhou, Zhengqing
Zhou, Zhengyuan
Bai, Qinxun
Qiu, Linhai
Blanchet, Jose
Glynn, Peter
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[4] Path Planning Using Wasserstein Distributionally Robust Deep Q-learning
Alpturk, Cem
Renganathan, Venkatraman
2023 EUROPEAN CONTROL CONFERENCE, ECC, 2023,
[5] Sample Complexity of Kernel-Based Q-Learning
Yeh, Sing-Yuan
Chang, Fu-Chieh
Yueh, Chang-Wei
Wu, Pei-Yuan
Bernacchia, Alberto
Vakili, Sattar
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206 : 453 - 469
[6] Tightening the Dependence on Horizon in the Sample Complexity of Q-Learning
Li, Gen
Cai, Changxiao
Chen, Yuxin
Gu, Yuantao
Wei, Yuting
Chi, Yuejie
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[7] The Sample Complexity of Teaching-by-Reinforcement on Q-Learning
Zhang, Xuezhou
Bharti, Shubham Kumar
Ma, Yuzhe
Singla, Adish
Zhu, Xiaojin
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 10939 - 10947
[8] Sample Complexity for Distributionally Robust Learning under χ2-divergence
Zhou, Zhengyu
Liu, Weiwei
JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
[9] Is Q-Learning Minimax Optimal? A Tight Sample Complexity Analysis
Li, Gen
Cai, Changxiao
Chen, Yuxin
Wei, Yuting
Chi, Yuejie
OPERATIONS RESEARCH, 2024, 72 (01) : 222 - 236
[10] Sample Complexity of Decentralized Tabular Q-Learning for Stochastic Games
Gao, Zuguang
Ma, Qianqian
Basar, Tamer
Birge, John R.
2023 AMERICAN CONTROL CONFERENCE, ACC, 2023, : 1098 - 1103

← 1 2 3 4 5 →