Improving Generalization in Reinforcement Learning-Based Trading by Using a Generative Adversarial Market Model

被引：19

作者：

Kuo, Chia-Hsuan ^{[1
]}

Chen, Chiao-Ting ^{[2
]}

Lin, Sin-Jing ^{[1
]}

Huang, Szu-Hao ^{[3
]}

机构：

[1] Natl Chiao Tung Univ, Inst Informat Management, Hsinchu 30010, Taiwan

[2] Natl Chiao Tung Univ, Dept Comp Sci, Hsinchu 30010, Taiwan

[3] Natl Chiao Tung Univ, Dept Informat Management & Finance, Hsinchu 30010, Taiwan

来源：

IEEE ACCESS | 2021年 / 9卷

关键词：

Portfolios; Training; Optimization; Topology; Data models; Stock markets; Network topology; Artificial market simulation; portfolio management; reinforcement learning; PORTFOLIO; STRATEGY;

D O I：

10.1109/ACCESS.2021.3068269

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the increasing sophistication of artificial intelligence, reinforcement learning (RL) has been widely applied to portfolio management. However, shortcomings remain. Specifically, because the training environment of an RL-based portfolio optimization framework is usually constructed based on historical price data in the literature, the agent potentially 1) violates the definition of a Markov decision process (MDP), 2) ignores their own market impact, or 3) fails to account for causal relationships within interaction processes; these ultimately lead the agent to make poor generalizations. To surmount these problems-specifically, to help the RL-based portfolio agent make better generalizations-we introduce an interactive training environment that leverages a generative model, called the limit order book-generative adversarial model (LOB-GAN), to simulate a financial market. Specifically, the LOB-GAN models market ordering behavior, and LOB-GAN's generator is utilized as a market behavior simulator. A simulated financial market, called Virtual Market, is constructed by the market behavior simulator in conjunction with a realistic security matching system. Virtual Market is then leveraged as an interactive training environment for the RL-based portfolio agent. The experimental results demonstrate that our framework improves out-of-sample portfolio performance by 4%, which is superior to other generalization strategies.

引用

页码：50738 / 50754

页数：17

共 46 条

[1] Agarwal A, 2006, P INT C MACH LEARN, P9, DOI DOI 10.1145/1143844.1143846
[2] Arjovsky L, 2017, CORR, P214
[3] Balch T. H., 2019, ARXIV190412066
[4] GENERALIZED AUTOREGRESSIVE CONDITIONAL HETEROSKEDASTICITY
BOLLERSLEV, T
[J]. JOURNAL OF ECONOMETRICS, 1986, 31 (03) : 307 - 327
[5] Borodin A, 2004, ADV NEUR IN, V16, P345
[6] On the competitive theory and practice of portfolio selection - (Extended abstract)
Borodin, A
El-Yaniv, R
Gogan, V
[J]. LATIN 2000: THEORETICAL INFORMATICS, 2000, 1776 : 173 - 196
[7] Cobbe Karl, 2018, ARXIV181202341
[8] Cover T.M., 2011, The Kelly Capital Growth Investment Criterion: Theory and Practice, P181
[9] Deep Direct Reinforcement Learning for Financial Signal Representation and Trading
Deng, Yue
Bao, Feng
Kong, Youyong
Ren, Zhiquan
Dai, Qionghai
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (03) : 653 - 664
[10] Finn C, 2017, PR MACH LEARN RES, V70

← 1 2 3 4 5 →