Improving Generalization in Reinforcement Learning-Based Trading by Using a Generative Adversarial Market Model

被引:19
作者
Kuo, Chia-Hsuan [1 ]
Chen, Chiao-Ting [2 ]
Lin, Sin-Jing [1 ]
Huang, Szu-Hao [3 ]
机构
[1] Natl Chiao Tung Univ, Inst Informat Management, Hsinchu 30010, Taiwan
[2] Natl Chiao Tung Univ, Dept Comp Sci, Hsinchu 30010, Taiwan
[3] Natl Chiao Tung Univ, Dept Informat Management & Finance, Hsinchu 30010, Taiwan
关键词
Portfolios; Training; Optimization; Topology; Data models; Stock markets; Network topology; Artificial market simulation; portfolio management; reinforcement learning; PORTFOLIO; STRATEGY;
D O I
10.1109/ACCESS.2021.3068269
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the increasing sophistication of artificial intelligence, reinforcement learning (RL) has been widely applied to portfolio management. However, shortcomings remain. Specifically, because the training environment of an RL-based portfolio optimization framework is usually constructed based on historical price data in the literature, the agent potentially 1) violates the definition of a Markov decision process (MDP), 2) ignores their own market impact, or 3) fails to account for causal relationships within interaction processes; these ultimately lead the agent to make poor generalizations. To surmount these problems-specifically, to help the RL-based portfolio agent make better generalizations-we introduce an interactive training environment that leverages a generative model, called the limit order book-generative adversarial model (LOB-GAN), to simulate a financial market. Specifically, the LOB-GAN models market ordering behavior, and LOB-GAN's generator is utilized as a market behavior simulator. A simulated financial market, called Virtual Market, is constructed by the market behavior simulator in conjunction with a realistic security matching system. Virtual Market is then leveraged as an interactive training environment for the RL-based portfolio agent. The experimental results demonstrate that our framework improves out-of-sample portfolio performance by 4%, which is superior to other generalization strategies.
引用
收藏
页码:50738 / 50754
页数:17
相关论文
共 46 条
  • [1] Agarwal A, 2006, P INT C MACH LEARN, P9, DOI DOI 10.1145/1143844.1143846
  • [2] Arjovsky L, 2017, CORR, P214
  • [3] Balch T. H., 2019, ARXIV190412066
  • [4] GENERALIZED AUTOREGRESSIVE CONDITIONAL HETEROSKEDASTICITY
    BOLLERSLEV, T
    [J]. JOURNAL OF ECONOMETRICS, 1986, 31 (03) : 307 - 327
  • [5] Borodin A, 2004, ADV NEUR IN, V16, P345
  • [6] On the competitive theory and practice of portfolio selection - (Extended abstract)
    Borodin, A
    El-Yaniv, R
    Gogan, V
    [J]. LATIN 2000: THEORETICAL INFORMATICS, 2000, 1776 : 173 - 196
  • [7] Cobbe Karl, 2018, ARXIV181202341
  • [8] Cover T.M., 2011, The Kelly Capital Growth Investment Criterion: Theory and Practice, P181
  • [9] Deep Direct Reinforcement Learning for Financial Signal Representation and Trading
    Deng, Yue
    Bao, Feng
    Kong, Youyong
    Ren, Zhiquan
    Dai, Qionghai
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (03) : 653 - 664
  • [10] Finn C, 2017, PR MACH LEARN RES, V70