Reinforcement learning with dynamic convex risk measures

被引：7

作者：

Coache, Anthony ^{[1
]}

Jaimungal, Sebastian ^{[1
,2
]}

机构：

[1] Univ Toronto, Dept Stat Sci, Toronto, ON, Canada

[2] Univ Oxford, Oxford Man Inst, Oxford, England

来源：

MATHEMATICAL FINANCE | 2024年 / 34卷 / 02期

基金：

加拿大自然科学与工程研究理事会;

关键词：

actor-critic algorithm; dynamic risk measures; financial hedging; policy gradient; reinforcement learning; robot control; time-consistency; trading strategies; APPROXIMATE; NETWORKS;

D O I：

10.1111/mafi.12388

中图分类号：

F8 [财政、金融];

学科分类号：

0202 ;

摘要：

We develop an approach for solving time-consistent risk-sensitive stochastic optimization problems using model-free reinforcement learning (RL). Specifically, we assume agents assess the risk of a sequence of random variables using dynamic convex risk measures. We employ a time-consistent dynamic programming principle to determine the value of a particular policy, and develop policy gradient update rules that aid in obtaining optimal policies. We further develop an actor-critic style algorithm using neural networks to optimize over policies. Finally, we demonstrate the performance and flexibility of our approach by applying it to three optimization problems: statistical arbitrage trading strategies, financial hedging, and obstacle avoidance robot control.

引用

页码：557 / 587

页数：31

共 67 条

[41] Köse Ü, 2021, J MACH LEARN RES, V22
[42] MULTILAYER FEEDFORWARD NETWORKS WITH A NONPOLYNOMIAL ACTIVATION FUNCTION CAN APPROXIMATE ANY FUNCTION
LESHNO, M
LIN, VY
PINKUS, A
SCHOCKEN, S
[J]. NEURAL NETWORKS, 1993, 6 (06) : 861 - 867
[43] Lyons TJ, 2004, The Mathematical Intelligencer, V26, P67
[44] Mehrotra, 2019, DISTRIBUTIONALLY ROB
[45] Mil'shtein G. N., 1974, Theory of Probability and Its Applications, V19, P557
[46] Envelope theorems for arbitrary choice sets
Milgrom, P
Segal, I
[J]. ECONOMETRICA, 2002, 70 (02) : 583 - 601
[47] Nass D, 2019, IEEE INT C INT ROBOT, P1101, DOI [10.1109/iros40897.2019.8967699, 10.1109/IROS40897.2019.8967699]
[48] Ning B., 2021, ARBITRAGE FREE IMPLI
[49] Osogami Takayuki, 2012, Advances in Neural Information Processing Systems, V25, P233
[50] Peng S., 1997, PITMAN RES NOTES MAT

← 1 2 3 4 5 6 7 →