Robust Q-learning algorithm for Markov decision processes under Wasserstein uncertainty

被引:2
|
作者
Neufeld, Ariel [1 ]
Sester, Julian [2 ]
机构
[1] NTU Singapore, Div Math Sci, 21 Nanyang Link, Singapore 637371, Singapore
[2] Natl Univ Singapore, Dept Math, 21 Lower Kent Ridge Rd, Singapore 119077, Singapore
关键词
Markov decision process; Wasserstein uncertainty; Distributionally robust optimization; Reinforcement learning; Q-learning; STRATEGIES;
D O I
10.1016/j.automatica.2024.111825
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a novel Q-learning algorithm tailored to solve distributionally robust Markov decision problems where the corresponding ambiguity set of transition probabilities for the underlying Markov decision process is a Wasserstein ball around a (possibly estimated) reference measure. We prove convergence of the presented algorithm and provide several examples also using real data to illustrate both the tractability of our algorithm as well as the benefits of considering distributional robustness when solving stochastic optimal control problems, in particular when the estimated distributions turn out to be misspecified in practice. (c) 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] A Novel Q-learning Algorithm with Function Approximation for Constrained Markov Decision Processes
    Lakshmanan, K.
    Bhatnagar, Shalabh
    2012 50TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2012, : 400 - 405
  • [2] Q-learning for Markov decision processes with a satisfiability criterion
    Shah, Suhail M.
    Borkar, Vivek S.
    SYSTEMS & CONTROL LETTERS, 2018, 113 : 45 - 51
  • [3] Safe Q-Learning Method Based on Constrained Markov Decision Processes
    Ge, Yangyang
    Zhu, Fei
    Lin, Xinghong
    Liu, Quan
    IEEE ACCESS, 2019, 7 : 165007 - 165017
  • [4] Exploiting the structural properties of the underlying Markov decision problem in the Q-learning algorithm
    Kunnumkal, Sumit
    Topaloglu, Huseyin
    INFORMS JOURNAL ON COMPUTING, 2008, 20 (02) : 288 - 301
  • [5] Relative Q-Learning for Average-Reward Markov Decision Processes With Continuous States
    Yang, Xiangyu
    Hu, Jiaqiao
    Hu, Jian-Qiang
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (10) : 6546 - 6560
  • [6] Kernelized Q-Learning for Large-Scale, Potentially Continuous, Markov Decision Processes
    Sledge, Isaac J.
    Principe, Jose C.
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018, : 153 - 162
  • [7] Backward Q-learning: The combination of Sarsa algorithm and Q-learning
    Wang, Yin-Hao
    Li, Tzuu-Hseng S.
    Lin, Chih-Jui
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (09) : 2184 - 2193
  • [8] Markov decision processes under model uncertainty
    Neufeld, Ariel
    Sester, Julian
    Sikic, Mario
    MATHEMATICAL FINANCE, 2023, 33 (03) : 618 - 665
  • [9] Q-learning algorithms for constrained Markov decision processes with randomized monotone policies:: Application to MIMO transmission control
    Djonin, Dejan V.
    Krishnamurthy, Vikram
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2007, 55 (05) : 2170 - 2181
  • [10] Reinforcement Learning in Robust Markov Decision Processes
    Lim, Shiau Hong
    Xu, Huan
    Mannor, Shie
    MATHEMATICS OF OPERATIONS RESEARCH, 2016, 41 (04) : 1325 - 1353