A Finite Sample Complexity Bound for Distributionally Robust Q-learning

被引：0

作者：

Wang, Shengbo ^{[1
]}

Si, Nian ^{[2
]}

Blanchet, Jose ^{[1
]}

Zhou, Zhengyuan ^{[3
]}

机构：

[1] Stanford Univ, Stanford, CA 94305 USA

[2] Univ Chicago, Chicago, IL USA

[3] NYU, New York, NY USA

来源：

INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206 | 2023年 / 206卷

基金：

美国国家科学基金会;

关键词：

OPTIMIZATION; UNCERTAINTY; GO;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider a reinforcement learning setting in which the deployment environment is different from the training environment. Applying a robust Markov decision processes formulation, we extend the distributionally robust Q-learning framework studied in Liu et al. (2022). Further, we improve the design and analysis of their multi-level Monte Carlo estimator. Assuming access to a simulator, we prove that the worst-case expected sample complexity of our algorithm to learn the optimal robust Q-function within an epsilon error in the sup norm is upper bounded by (O) over tilde(|S||A|(1 - gamma)(-5)epsilon(-2)p((sic))(-6)delta(-4)), where gamma is the discount rate, p((sic)) is the non-zero minimal support probability of the transition kernels and delta is the uncertainty size. This is the first sample complexity result for the model-free robust RL problem. Simulation studies further validate our theoretical results.

引用

页数：29

共 50 条

[31] A Lower Bound for the Sample Complexity of Inverse Reinforcement Learning
Komanduru, Abi
Honorio, Jean
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[32] Distributionally Robust Imitation Learning
Bashiri, Mohammad Ali
Ziebart, Brian D.
Zhang, Xinhua
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[33] Q-Learning for Robust Satisfaction of Signal Temporal Logic Specifications
Aksaray, Derya
Jones, Austin
Kong, Zhaodan
Schwager, Mac
Belta, Cahn
2016 IEEE 55TH CONFERENCE ON DECISION AND CONTROL (CDC), 2016, : 6565 - 6570
[34] Robust flipping stabilization of Boolean networks: A Q-learning approach
Liu, Zejiao
Liu, Yang
Ruan, Qihua
Gui, Weihua
SYSTEMS & CONTROL LETTERS, 2023, 176
[35] Finite-Sample Guarantees for Wasserstein Distributionally Robust Optimization: Breaking the Curse of Dimensionality
Gao, Rui
OPERATIONS RESEARCH, 2023, 71 (06) : 2291 - 2306
[36] Making Deep Q-learning Methods Robust to Time Discretization
Tallec, Corentin
Blier, Leonard
Ollivier, Yann
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[37] Q-learning with continuous state spaces and finite decision set
Barty, Kengy
Girardeau, Pierre
Roy, Jean-Sebastien
Strugarek, Cyrille
2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, : 346 - +
[38] Final Iteration Convergence Bound of Q-Learning: Switching System Approach
Lee, Donghwan
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (07) : 4765 - 4772
[39] Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning
Ohnishi, Shota
Uchibe, Eiji
Yamaguchi, Yotaro
Nakanishi, Kosuke
Yasui, Yuji
Ishii, Shin
FRONTIERS IN NEUROROBOTICS, 2019, 13
[40] Learning rates for Q-Learning
Even-Dar, E
Mansour, Y
COMPUTATIONAL LEARNING THEORY, PROCEEDINGS, 2001, 2111 : 589 - 604

← 1 2 3 4 5 →