Robust Risk-Sensitive Reinforcement Learning with Conditional Value-at-Risk

被引:0
作者
Ni, Xinyi [1 ]
Lai, Lifeng [1 ]
机构
[1] Univ Calif Davis, Elect & Comp Engn, Davis, CA USA
来源
2024 IEEE INFORMATION THEORY WORKSHOP, ITW 2024 | 2024年
基金
美国国家科学基金会;
关键词
ambiguity sets; RMDP; risk-sensitive RL; CVaR; OPTIMIZATION;
D O I
10.1109/ITW61385.2024.10806953
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Robust Markov Decision Processes (RMDPs) have received significant research interest, offering an alternative to standard Markov Decision Processes (MDPs) that often assume fixed transition probabilities. RMDPs address this by optimizing for the worst-case scenarios within ambiguity sets. While earlier studies on RMDPs have largely centered on risk-neutral reinforcement learning (RL), with the goal of minimizing expected total discounted costs, in this paper, we analyze the robustness of CVaR-based risk-sensitive RL under RMDP. Firstly, we consider predetermined ambiguity sets. Based on the coherency of CVaR, we establish a connection between robustness and risk sensitivity, thus, techniques in risk-sensitive RL can be adopted to solve the proposed problem. Furthermore, motivated by the existence of decision-dependent uncertainty in real-world problems, we study problems with state-action-dependent ambiguity sets. To solve this, we define a new risk measure named NCVaR and build the equivalence of NCVaR optimization and robust CVaR optimization. We further propose value iteration algorithms and validate our approach in simulation experiments.
引用
收藏
页码:520 / 525
页数:6
相关论文
共 50 条
[41]   Nonparametric Compositional Stochastic Optimization for Risk-Sensitive Kernel Learning [J].
Bedi, Amrit Singh ;
Koppel, Alec ;
Rajawat, Ketan ;
Sanyal, Panchajanya .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2021, 69 :428-442
[42]   Route Optimization of Hazardous Material Railway Transportation Based on Conditional Value-at-Risk Considering Risk Equity [J].
Liu, Liping ;
Sun, Shilei ;
Li, Shuxia .
MATHEMATICS, 2025, 13 (05)
[43]   Worst-Case Conditional Value-at-Risk Minimization for Hazardous Materials Transportation [J].
Toumazis, Iakovos ;
Kwon, Changhyun .
TRANSPORTATION SCIENCE, 2016, 50 (04) :1174-1187
[44]   Superquantile regression with applications to buffered reliability, uncertainty quantification, and conditional value-at-risk [J].
Rockafellar, R. T. ;
Royset, J. O. ;
Miranda, S. I. .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2014, 234 (01) :140-154
[45]   Optimizing Critical Spare Parts and Location Based on the Conditional Value-At-Risk Criterion [J].
Trusevych, Stephan A. ;
Kwon, Roy H. ;
Jardine, Andrew K. S. .
ENGINEERING ECONOMIST, 2014, 59 (02) :116-135
[46]   Wholesale Price for Supply Chain Coordination via Conditional Value-at-Risk Minimization [J].
Wang, Chuanxu .
INFORMATION TECHNOLOGY FOR MANUFACTURING SYSTEMS, PTS 1 AND 2, 2010, :88-93
[47]   THRESHOLD VALUE OF THE PENALTY PARAMETER IN THE MINIMIZATION OF L1-PENALIZED CONDITIONAL VALUE-AT-RISK [J].
Gaitsgory, Vladimir ;
Tarnopolskaya, Tanya .
JOURNAL OF INDUSTRIAL AND MANAGEMENT OPTIMIZATION, 2013, 9 (01) :191-204
[48]   Robust Target Localization in 2D: A Value-at-Risk Approach [J].
Domingos, Joao ;
Xavier, Joao .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2024, 72 :3028-3042
[49]   Risk-sensitive Ramsey growth model [J].
Sladky, Karel .
28TH INTERNATIONAL CONFERENCE ON MATHEMATICAL METHODS IN ECONOMICS 2010, PTS I AND II, 2010, :560-565
[50]   Calibrating risk preferences with the generalized capital asset pricing model based on mixed conditional value-at-risk deviation [J].
Kalinchenko, Konstantin ;
Uryasev, Stan ;
Rockafellar, R. Tyrrell .
JOURNAL OF RISK, 2012, 15 (01) :45-70