Online portfolio management via deep reinforcement learning with high-frequency data

被引：26

作者：

Li, Jiahao ^{[1
]}

Zhang, Yong ^{[1
]}

Yang, Xingyu ^{[1
]}

Chen, Liangwei ^{[1
]}

机构：

[1] Guangdong Univ Technol, Sch Management, Guangzhou 510520, Peoples R China

来源：

INFORMATION PROCESSING & MANAGEMENT | 2023年 / 60卷 / 03期

关键词：

Portfolio management; Deep reinforcement learning; Cryptocurrency; Bitcoin; Online learning; High-frequency trading; REVERSION STRATEGY; TRADING SYSTEM; OPTIMIZATION; PERFORMANCE; ATTENTION; ALGORITHM; LEVEL;

D O I：

10.1016/j.ipm.2022.103247

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, models that based on Transformer (Vaswani et al., 2017) have yielded superior results in many sequence modeling tasks. The ability of Transformer to capture long-range dependen-cies and interactions makes it possible to apply it in the field of portfolio management (PM). However, the built-in quadratic complexity of the Transformer prevents its direct application to the PM task. To solve this problem, in this paper, we propose a deep reinforcement learning -based PM framework called LSRE-CAAN, with two important components: a long sequence representations extractor and a cross-asset attention network. Direct Policy Gradient is used to solve the sequential decision problem in the PM process. We conduct numerical experiments in three aspects using four different cryptocurrency datasets, and the empirical results show that our framework is more effective than both traditional and state-of-the-art (SOTA) online portfolio strategies, achieving a 6x return on the best dataset. In terms of risk metrics, our framework has an average volatility risk of 0.46 and an average maximum drawdown risk of 0.27 across the four datasets, both of which are lower than the vast majority of SOTA strategies. In addition, while the vast majority of SOTA strategies maintain a poor turnover rate of approximately greater than 50% on average, our framework enjoys a relatively low turnover rate on all datasets, efficiency analysis illustrates that our framework no longer has the quadratic dependency limitation.

引用

页数：21

共 97 条

[1]

Agarwal A., 2006, ACM P 23 INT C MACH, P9, DOI DOI 10.1145/1143844.1143846

[2] An adaptive portfolio trading system: A risk-return portfolio optimization using recurrent reinforcement learning with expected maximum drawdown [J].

Almahdi, Saud ;

Yang, Steve Y. .

EXPERT SYSTEMS WITH APPLICATIONS, 2017, 87 :267-279

[3]

Bao G., 2021, arXiv

[4]

Beltagy I, 2020, Arxiv, DOI [arXiv:2004.05150, 10.48550/arXiv.2004.05150]

[5]

Bertoluzzo F, 2007, LECT NOTES ARTIF INT, V4693, P619

[6] Universal portfolios with and without transaction costs [J].

Blum, A ;

Kalai, A .

MACHINE LEARNING, 1999, 35 (03) :193-205

[7] Can we learn to beat the best stock [J].

Borodin, A ;

El-Yaniv, R ;

Gogan, V .

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2004, 21 :579-594

[8]

Cai X, 2020, PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P4469

[9] Gaussian Weighting Reversion Strategy for Accurate Online Portfolio Selection [J].

Cai, Xia ;

Ye, Zekun .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2019, 67 (21) :5558-5570

[10] Assessing dynamic qualities of investor sentiments for stock recommendation [J].

Chang, Jun ;

Tu, Wenting ;

Yu, Changrui ;

Qin, Chuan .

INFORMATION PROCESSING & MANAGEMENT, 2021, 58 (02)

← 1 2 3 4 5 6 7 8 9 10 →