Reinforcement learning with dynamic convex risk measures

被引：7

作者：

Coache, Anthony ^{[1
]}

Jaimungal, Sebastian ^{[1
,2
]}

机构：

[1] Univ Toronto, Dept Stat Sci, Toronto, ON, Canada

[2] Univ Oxford, Oxford Man Inst, Oxford, England

来源：

MATHEMATICAL FINANCE | 2024年 / 34卷 / 02期

基金：

加拿大自然科学与工程研究理事会;

关键词：

actor-critic algorithm; dynamic risk measures; financial hedging; policy gradient; reinforcement learning; robot control; time-consistency; trading strategies; APPROXIMATE; NETWORKS;

D O I：

10.1111/mafi.12388

中图分类号：

F8 [财政、金融];

学科分类号：

0202 ;

摘要：

We develop an approach for solving time-consistent risk-sensitive stochastic optimization problems using model-free reinforcement learning (RL). Specifically, we assume agents assess the risk of a sequence of random variables using dynamic convex risk measures. We employ a time-consistent dynamic programming principle to determine the value of a particular policy, and develop policy gradient update rules that aid in obtaining optimal policies. We further develop an actor-critic style algorithm using neural networks to optimize over policies. Finally, we demonstrate the performance and flexibility of our approach by applying it to three optimization problems: statistical arbitrage trading strategies, financial hedging, and obstacle avoidance robot control.

引用

页码：557 / 587

页数：31

共 67 条

[11] Dynamic assessment indices
Bielecki, Tomasz R.
Cialenco, Igor
Drapeau, Samuel
Karliczek, Martin
[J]. STOCHASTICS-AN INTERNATIONAL JOURNAL OF PROBABILITY AND STOCHASTIC PROCESSES, 2016, 88 (01) : 1 - 44
[12] Campbell S., 2021, DEEP LEARNING PRINCI
[13] Carmona R., 2021, DEEP LEARNING MEAN F
[14] Casgrain P., 2022, APPL MATH FINANCE
[15] Time-inconsistency of VaR and time-consistent alternatives
Cheridito, Patrick
Stadje, Mitja
[J]. FINANCE RESEARCH LETTERS, 2009, 6 (01) : 40 - 46
[16] Chow Y, 2018, J MACH LEARN RES, V18
[17] Markov decision processes with iterated coherent risk measures
Chu, Shanyun
Zhang, Yi
[J]. INTERNATIONAL JOURNAL OF CONTROL, 2014, 87 (11) : 2286 - 2293
[18] A Generative Adversarial Network Approach to Calibration of Local Stochastic Volatility Models
Cuchiero, Christa
Khosrawi, Wahid
Teichmann, Josef
[J]. RISKS, 2020, 8 (04) : 1 - 31
[19] Cybenko G., 1989, Mathematics of Control, Signals, and Systems, V2, P303, DOI 10.1007/BF02551274
[20] Percentile Optimization for Markov Decision Processes with Parameter Uncertainty
Delage, Erick
Mannor, Shie
[J]. OPERATIONS RESEARCH, 2010, 58 (01) : 203 - 213

← 1 2 3 4 5 6 7 →