Reinforcement learning with dynamic convex risk measures

被引：7

作者：

Coache, Anthony ^{[1
]}

Jaimungal, Sebastian ^{[1
,2
]}

机构：

[1] Univ Toronto, Dept Stat Sci, Toronto, ON, Canada

[2] Univ Oxford, Oxford Man Inst, Oxford, England

来源：

MATHEMATICAL FINANCE | 2024年 / 34卷 / 02期

基金：

加拿大自然科学与工程研究理事会;

关键词：

actor-critic algorithm; dynamic risk measures; financial hedging; policy gradient; reinforcement learning; robot control; time-consistency; trading strategies; APPROXIMATE; NETWORKS;

D O I：

10.1111/mafi.12388

中图分类号：

F8 [财政、金融];

学科分类号：

0202 ;

摘要：

We develop an approach for solving time-consistent risk-sensitive stochastic optimization problems using model-free reinforcement learning (RL). Specifically, we assume agents assess the risk of a sequence of random variables using dynamic convex risk measures. We employ a time-consistent dynamic programming principle to determine the value of a particular policy, and develop policy gradient update rules that aid in obtaining optimal policies. We further develop an actor-critic style algorithm using neural networks to optimize over policies. Finally, we demonstrate the performance and flexibility of our approach by applying it to three optimization problems: statistical arbitrage trading strategies, financial hedging, and obstacle avoidance robot control.

引用

页码：557 / 587

页数：31

共 67 条

[51] Petrik M., 2012, APPROXIMATE SOLUTION
[52] Pinkus A., 1999, Acta Numerica, V8, P143, DOI 10.1017/S0962492900002919
[53] Prashanth LA, 2013, Advances in Neural Information Processing Systems
[54] Dynamic coherent risk measures
Riedel, F
[J]. STOCHASTIC PROCESSES AND THEIR APPLICATIONS, 2004, 112 (02) : 185 - 200
[55] Rockafellar R.T., 2013, Surveys in Operations Research and Management Science, V18, P33, DOI DOI 10.1016/J.SORMS.2013.03.001
[56] Rockafellar R. T., 2000, Journal of Risk Research, V2, P21, DOI 10.21314/jor.2000.038
[57] Deep learning
Rusk, Nicole
[J]. NATURE METHODS, 2016, 13 (01) : 35 - 35
[58] Risk-averse dynamic programming for Markov decision processes
Ruszczynski, Andrzej
[J]. MATHEMATICAL PROGRAMMING, 2010, 125 (02) : 235 - 261
[59] Shapiro A, 2014, MOS-SIAM SER OPTIMIZ, P1
[60] Risk-Sensitive Reinforcement Learning
Shen, Yun
Tobia, Michael J.
Sommer, Tobias
Obermayer, Klaus
[J]. NEURAL COMPUTATION, 2014, 26 (07) : 1298 - 1328

← 1 2 3 4 5 6 7 →