Robust Q-learning algorithm for Markov decision processes under Wasserstein uncertainty

被引：2

作者：

Neufeld, Ariel ^{[1
]}

Sester, Julian ^{[2
]}

机构：

[1] NTU Singapore, Div Math Sci, 21 Nanyang Link, Singapore 637371, Singapore

[2] Natl Univ Singapore, Dept Math, 21 Lower Kent Ridge Rd, Singapore 119077, Singapore

来源：

AUTOMATICA | 2024年 / 168卷

关键词：

Markov decision process; Wasserstein uncertainty; Distributionally robust optimization; Reinforcement learning; Q-learning; STRATEGIES;

D O I：

10.1016/j.automatica.2024.111825

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We present a novel Q-learning algorithm tailored to solve distributionally robust Markov decision problems where the corresponding ambiguity set of transition probabilities for the underlying Markov decision process is a Wasserstein ball around a (possibly estimated) reference measure. We prove convergence of the presented algorithm and provide several examples also using real data to illustrate both the tractability of our algorithm as well as the benefits of considering distributional robustness when solving stochastic optimal control problems, in particular when the estimated distributions turn out to be misspecified in practice. (c) 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.

引用

页数：13

共 50 条

[1] A Novel Q-learning Algorithm with Function Approximation for Constrained Markov Decision Processes
Lakshmanan, K.
Bhatnagar, Shalabh
2012 50TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2012, : 400 - 405
[2] Q-learning for Markov decision processes with a satisfiability criterion
Shah, Suhail M.
Borkar, Vivek S.
SYSTEMS & CONTROL LETTERS, 2018, 113 : 45 - 51
[3] Safe Q-Learning Method Based on Constrained Markov Decision Processes
Ge, Yangyang
Zhu, Fei
Lin, Xinghong
Liu, Quan
IEEE ACCESS, 2019, 7 : 165007 - 165017
[4] Exploiting the structural properties of the underlying Markov decision problem in the Q-learning algorithm
Kunnumkal, Sumit
Topaloglu, Huseyin
INFORMS JOURNAL ON COMPUTING, 2008, 20 (02) : 288 - 301
[5] Relative Q-Learning for Average-Reward Markov Decision Processes With Continuous States
Yang, Xiangyu
Hu, Jiaqiao
Hu, Jian-Qiang
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (10) : 6546 - 6560
[6] Kernelized Q-Learning for Large-Scale, Potentially Continuous, Markov Decision Processes
Sledge, Isaac J.
Principe, Jose C.
2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018, : 153 - 162
[7] Backward Q-learning: The combination of Sarsa algorithm and Q-learning
Wang, Yin-Hao
Li, Tzuu-Hseng S.
Lin, Chih-Jui
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (09) : 2184 - 2193
[8] Markov decision processes under model uncertainty
Neufeld, Ariel
Sester, Julian
Sikic, Mario
MATHEMATICAL FINANCE, 2023, 33 (03) : 618 - 665
[9] Q-learning algorithms for constrained Markov decision processes with randomized monotone policies:: Application to MIMO transmission control
Djonin, Dejan V.
Krishnamurthy, Vikram
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2007, 55 (05) : 2170 - 2181
[10] Reinforcement Learning in Robust Markov Decision Processes
Lim, Shiau Hong
Xu, Huan
Mannor, Shie
MATHEMATICS OF OPERATIONS RESEARCH, 2016, 41 (04) : 1325 - 1353

← 1 2 3 4 5 →