Reinforcement learning with dynamic convex risk measures

被引:7
作者
Coache, Anthony [1 ]
Jaimungal, Sebastian [1 ,2 ]
机构
[1] Univ Toronto, Dept Stat Sci, Toronto, ON, Canada
[2] Univ Oxford, Oxford Man Inst, Oxford, England
基金
加拿大自然科学与工程研究理事会;
关键词
actor-critic algorithm; dynamic risk measures; financial hedging; policy gradient; reinforcement learning; robot control; time-consistency; trading strategies; APPROXIMATE; NETWORKS;
D O I
10.1111/mafi.12388
中图分类号
F8 [财政、金融];
学科分类号
0202 ;
摘要
We develop an approach for solving time-consistent risk-sensitive stochastic optimization problems using model-free reinforcement learning (RL). Specifically, we assume agents assess the risk of a sequence of random variables using dynamic convex risk measures. We employ a time-consistent dynamic programming principle to determine the value of a particular policy, and develop policy gradient update rules that aid in obtaining optimal policies. We further develop an actor-critic style algorithm using neural networks to optimize over policies. Finally, we demonstrate the performance and flexibility of our approach by applying it to three optimization problems: statistical arbitrage trading strategies, financial hedging, and obstacle avoidance robot control.
引用
收藏
页码:557 / 587
页数:31
相关论文
共 67 条
  • [51] Petrik M., 2012, APPROXIMATE SOLUTION
  • [52] Pinkus A., 1999, Acta Numerica, V8, P143, DOI 10.1017/S0962492900002919
  • [53] Prashanth LA, 2013, Advances in Neural Information Processing Systems
  • [54] Dynamic coherent risk measures
    Riedel, F
    [J]. STOCHASTIC PROCESSES AND THEIR APPLICATIONS, 2004, 112 (02) : 185 - 200
  • [55] Rockafellar R.T., 2013, Surveys in Operations Research and Management Science, V18, P33, DOI DOI 10.1016/J.SORMS.2013.03.001
  • [56] Rockafellar R. T., 2000, Journal of Risk Research, V2, P21, DOI 10.21314/jor.2000.038
  • [57] Deep learning
    Rusk, Nicole
    [J]. NATURE METHODS, 2016, 13 (01) : 35 - 35
  • [58] Risk-averse dynamic programming for Markov decision processes
    Ruszczynski, Andrzej
    [J]. MATHEMATICAL PROGRAMMING, 2010, 125 (02) : 235 - 261
  • [59] Shapiro A, 2014, MOS-SIAM SER OPTIMIZ, P1
  • [60] Risk-Sensitive Reinforcement Learning
    Shen, Yun
    Tobia, Michael J.
    Sommer, Tobias
    Obermayer, Klaus
    [J]. NEURAL COMPUTATION, 2014, 26 (07) : 1298 - 1328