Hierarchical Deep Reinforcement Learning for VWAP Strategy Optimization

被引:0
作者
Li, Xiaodong [1 ]
Wu, Pangjing [1 ]
Zou, Chenxin [1 ]
Li, Qing [1 ,2 ]
机构
[1] Hohai Univ, Coll Comp & Informat, Nanjing 211100, Peoples R China
[2] Hong Kong Polytech Univ, Dept Comp, Hong Kong 999077, Peoples R China
基金
中国国家自然科学基金;
关键词
Costs; Heuristic algorithms; Task analysis; Big Data; Portfolios; Microstructure; Stock markets; Algorithmic trading; deep learning; hierarchical reinforcement learning; optimized trade execution; EXECUTION; FRAMEWORK;
D O I
10.1109/TBDATA.2023.3338011
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Designing algorithmic trading strategies targeting volume-weighted average price (VWAP) for long-duration orders is a critical concern for brokers. Traditional rule-based strategies are explicitly predetermined, lacking effective adaptability to achieve lower transaction costs in dynamic markets. Numerous studies have attempted to minimize transaction costs through reinforcement learning. However, the improvement for long-duration order trading strategies, such as VWAP strategy, remains limited due to intraday liquidity pattern changes and sparse reward signals. To address this issue, we propose a jointed model called Macro-Meta-Micro Trader, which combines deep learning and hierarchical reinforcement learning. This model aims to optimize parent order allocation and child order execution in the VWAP strategy, thereby reducing transaction costs for long-duration orders. It effectively captures market patterns and executes orders across different temporal scales. Our experiments on stocks listed on the Shanghai Stock Exchange demonstrated that our approach outperforms optimal baselines in terms of VWAP slippage by saving up to 2.22 base points, verifying that further splitting tranches into several subgoals can effectively reduce transaction costs.
引用
收藏
页码:288 / 300
页数:13
相关论文
共 54 条
  • [1] Ferreira TA, 2020, Arxiv, DOI arXiv:2011.04391
  • [2] Almgren R., 2000, J. Risk, V3, P5, DOI [10.21314/JOR.2001.041, DOI 10.21314/JOR.2001.041]
  • [3] Bayesian Adaptive Trading with a Daily Cycle
    Almgren, Robert
    Lorenz, Julian
    [J]. JOURNAL OF TRADING, 2006, 1 (04): : 38 - 46
  • [4] Andrychowicz M., 2017, Advances in Neural Information Processing Systems, V30, P5048
  • [5] Bacon PL, 2017, AAAI CONF ARTIF INTE, P1726
  • [6] THE TOTAL-COST OF TRANSACTIONS ON THE NYSE
    BERKOWITZ, SA
    LOGUE, DE
    NOSER, EA
    [J]. JOURNAL OF FINANCE, 1988, 43 (01) : 97 - 112
  • [7] Bertsimas D., 1998, Journal of financial markets, V1, P1, DOI DOI 10.1016/S1386-4181(97)00012-8
  • [8] Improving VWAP strategies: A dynamic volume approach
    Bialkowski, Jedrzej
    Darolles, Serge
    Le Fol, Gaelle
    [J]. JOURNAL OF BANKING & FINANCE, 2008, 32 (09) : 1709 - 1722
  • [9] Botte Alex, 2021, Two Sigma
  • [10] A Closed-Form Execution Strategy to Target Volume Weighted Average Price
    Cartea, Alvaro
    Jaimungal, Sebastian
    [J]. SIAM JOURNAL ON FINANCIAL MATHEMATICS, 2016, 7 (01): : 760 - 785