Deep Reinforcement Learning in Continuous Action Spaces for Pair Trading: A Comparative Study of A2 C and PPO

被引:0
作者
Cristian Quintero [1 ]
Diego Leon [1 ]
Javier Sandoval [1 ]
German Hernandez [2 ]
机构
[1] Department of Finance, Universidad Externado de Colombia, DC, Bogota
[2] Department of Engineering, Universidad Nacional de Colombia, DC, Bogota
关键词
Algorithmic trading; Deep reinforcement learning; Pair trading strategy;
D O I
10.1007/s42979-025-03854-0
中图分类号
学科分类号
摘要
Reinforcement learning (RL) is revolutionizing financial trading strategies by offering adaptive solutions for complex market behaviors. This paper examines the application of two RL algorithms-advantage actor-critic (A2 C) and proximal policy optimization (PPO)-in implementing a pair trading strategy, a market-neutral approach that leverages the price dynamics between correlated assets. Using historical price data from S&P 500 assets, we train and test these models, comparing their performance with the industry-standard benchmark based on mean deviation. In this way, results demonstrate that RL-based strategies achieve positive investment metrics for most pairs tested. Specifically, the A2 C algorithm achieved an average Sharpe ratio of 0.42 and a Sortino ratio of 0.63 for the AAP-SO pair, while the PPO algorithm achieved a Sharpe ratio of 0.41 and a Sortino ratio of 0.62 for the same pair. However, the benchmark model outperforms the RL models in several cases, suggesting opportunities for further refinement. This research underscores both the viability and limitations of RL in optimizing pair trading strategies within real-world constraints. © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2025.
引用
收藏
相关论文
共 16 条
[1]  
Isichenko M., Quantitative portfolio management. The art and science of statistical arbitrage. Wiley finance series, (2021)
[2]  
Diego L., Reinforcement learning for finance: A review, ODEON, 24, pp. 7-24, (2023)
[3]  
Zhang Z., Zohren S.R.S., Deep reinforcement learning for trading, J Financ Data Sci, 2, 2, pp. 25-40, (2020)
[4]  
Ganapathy V., Pairs trading: quantitative methods and analysis, (2004)
[5]  
Leon D., Sandoval J., Cruz A., Hernandez G., Sierra O., Deep heterogeneous automl trend prediction model for algorithmic trading in the USD/COP Colombian fx market through limit order book (lob), SN Comput Sci, 5, 5, (2024)
[6]  
Gatev E., Goetzmann W.N., Rouwenhorst K.G., Pairs trading: Performance of a relative value arbitrage rule, 8, 3, (2006)
[7]  
Powell W., Reinforcement learning and stochastic optimization: an unified framework for sequential decisions, (2022)
[8]  
Sutton R.S., Barto A.G., Reinforcement learning: an introduction, (2020)
[9]  
Dong H., Ding Z., Zhang S., Deep reinforcement learning: fundamentals, research and applications, (2020)
[10]  
Georgieva P., Mihaylova L., Kain L.C., Advances in intelligent signal processing and data mining, (2013)