Continuous control with Stacked Deep Dynamic Recurrent Reinforcement Learning for portfolio optimization

被引：52

作者：

Aboussalah, Amine Mohamed ^{[1
]}

Lee, Chi-Guhn ^{[1
]}

机构：

[1] Univ Toronto, Dept Mech & Ind Engn, Toronto, ON M5S 3G8, Canada

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2020年 / 140卷

基金：

芬兰科学院;

关键词：

Reinforcement learning; Policy gradient; Deep learning; Sequential model-based optimization; Financial time series; Portfolio management; Trading systems;

D O I：

10.1016/j.eswa.2019.112891

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recurrent reinforcement learning (RRL) techniques have been used to optimize asset trading systems and have achieved outstanding results. However, the majority of the previous work has been dedicated to systems with discrete action spaces. To address the challenge of continuous action and multi-dimensional state spaces, we propose the so called Stacked Deep Dynamic Recurrent Reinforcement Learning (SDDRRL) architecture to construct a real-time optimal portfolio. The algorithm captures the up-to-date market conditions and rebalances the portfolio accordingly. Under this general vision, Sharpe ratio, which is one of the most widely accepted measures of risk-adjusted returns, has been used as a performance metric. Additionally, the performance of most machine learning algorithms highly depends on their hyperparameter settings. Therefore, we equipped SDDRRL with the ability to find the best possible architecture topology using an automated Gaussian Process (GP) with Expected Improvement (El) as an acquisition function. This allows us to select the best architectures that maximizes the total return while respecting the cardinality constraints. Finally, our system was trained and tested in an online manner for 20 successive rounds with data for ten selected stocks from different sectors of the S&P 500 from January 1st, 2013 to July 31st, 2017. The experiments reveal that the proposed SDDRRL achieves superior performance compared to three benchmarks: the rolling horizon Mean-Variance Optimization (MVO) model, the rolling horizon risk parity model, and the uniform buy-and-hold (UBAH) index. (C) 2019 Elsevier Ltd. All rights reserved.

引用

页数：11

共 41 条

[1] An adaptive portfolio trading system: A risk-return portfolio optimization using recurrent reinforcement learning with expected maximum drawdown [J].

Almahdi, Saud ;

Yang, Steve Y. .

EXPERT SYSTEMS WITH APPLICATIONS, 2017, 87 :267-279

[2]

[Anonymous], P INT JOINT C NEUR N

[3]

[Anonymous], INT C AC SPEECH SIGN

[4]

[Anonymous], INT SYST C INTELLISY

[5]

[Anonymous], C COMP FIN

[6]

[Anonymous], ILOG CPLEX OPT STUD

[7]

[Anonymous], BAYES OPT FRAM PYTH

[8]

[Anonymous], 2012, P 25 INT C NEURIPS

[9]

[Anonymous], 2017, 20160036 CSLT

[10]

[Anonymous], INTELLIGENT SYSTEMS

← 1 2 3 4 5 →