Regret Bounds for Risk-sensitive Reinforcement Learning with Lipschitz Dynamic Risk Measures

被引:0
作者
Liang, Hao [1 ]
Luo, Zhi-Quan [1 ]
机构
[1] Chinese Univ Hong Kong, Shenzhen, Hong Kong, Peoples R China
来源
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238 | 2024年 / 238卷
关键词
COHERENCE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study finite episodic Markov decision processes incorporating dynamic risk measures to capture risk sensitivity. To this end, we present two model-based algorithms applied to Lipschitz dynamic risk measures, a wide range of risk measures that subsumes spectral risk measure, optimized certainty equivalent, and distortion risk measures, among others. We establish both regret upper bounds and lower bounds. Notably, our upper bounds demonstrate optimal dependencies on the number of actions and episodes while reflecting the inherent trade-off between risk sensitivity and sample complexity. Our approach offers a unified framework that not only encompasses multiple existing formulations in the literature but also broadens the application spectrum.
引用
收藏
页数:32
相关论文
共 54 条
[1]   On the coherence of expected shortfall [J].
Acerbi, C ;
Tasche, D .
JOURNAL OF BANKING & FINANCE, 2002, 26 (07) :1487-1503
[2]   Coherent measures of risk [J].
Artzner, P ;
Delbaen, F ;
Eber, JM ;
Heath, D .
MATHEMATICAL FINANCE, 1999, 9 (03) :203-228
[3]  
Asienkiewicz H, 2017, Applicationes Mathematicae, V44, P149, DOI 10.4064/am2317-1-2017
[4]  
Ayoub Alex., 2020, INT C MACHINE LEARNI, P463
[5]  
Azar Mohammad Gheshlaghi., 2017, PMLR, V70, P263
[6]   Minimizing spectral risk measures applied to Markov decision processes [J].
Baeuerle, Nicole ;
Glauner, Alexander .
MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2021, 94 (01) :35-69
[7]   Markov Decision Processes with Average-Value-at-Risk criteria [J].
Baeuerle, Nicole ;
Ott, Jonathan .
MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2011, 74 (03) :361-379
[8]   Properties of Distortion Risk Measures [J].
Balbas, Alejandro ;
Garrido, Jose ;
Mayoral, Silvia .
METHODOLOGY AND COMPUTING IN APPLIED PROBABILITY, 2009, 11 (03) :385-399
[9]  
Bastani O., 2022, Advances in Neural Information Processing Systems, V35, P36259
[10]   Markov decision processes with recursive risk measures [J].
Baeuerle, Nicole ;
Glauner, Alexander .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2022, 296 (03) :953-966