Action-specialized expert ensemble trading system with extended discrete action space using deep reinforcement learning

被引:0
作者
Leem, JoonBum [1 ]
Kim, Ha Young [2 ]
机构
[1] Ajou Univ, Dept Financial Engn, Suwon, South Korea
[2] Yonsei Univ, Grad Sch Informat, Seoul, South Korea
来源
PLOS ONE | 2020年 / 15卷 / 07期
关键词
PREDICTING STOCK; MACHINE; LIQUIDITY; FRAMEWORK; RETURNS;
D O I
10.1371/journal.pone.0236178; 10.1371/journal.pone.0236178.r001; 10.1371/journal.pone.0236178.r002; 10.1371/journal.pone.0236178.r003; 10.1371/journal.pone.0236178.r004
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Despite active research on trading systems based on reinforcement learning, the development and performance of research methods require improvements. This study proposes a new action-specialized expert ensemble method consisting of action-specialized expert models designed specifically for each reinforcement learning action: buy, hold, and sell. Models are constructed by examining and defining different reward values that correlate with each action under specific conditions, and investment behavior is reflected with each expert model. To verify the performance of this technique, profits of the proposed system are compared to those of single trading and common ensemble systems. To verify robustness and account for the extension of discrete action space, we compared and analyzed changes in profits of the three actions to our model's results. Furthermore, we checked for sensitivity with three different reward functions: profit, Sharpe ratio, and Sortino ratio. All experiments were conducted with S&P500, Hang Seng Index, and Eurostoxx50 data. The model was 39.1% and 21.6% more efficient than single and common ensemble models, respectively. Considering the extended discrete action space, the 3-action space was extended to 11- and 21-action spaces, and the cumulative returns increased by 427.2% and 856.7%, respectively. Results on reward functions indicated that our models are well trained; results of the Sharpe and Sortino ratios were better than the implementation of profit only, as in the single-model cases. The Sortino ratio was slightly better than the Sharpe ratio.
引用
收藏
页数:39
相关论文
共 58 条
[1]   An adaptive portfolio trading system: A risk-return portfolio optimization using recurrent reinforcement learning with expected maximum drawdown [J].
Almahdi, Saud ;
Yang, Steve Y. .
EXPERT SYSTEMS WITH APPLICATIONS, 2017, 87 :267-279
[2]   Learning dexterous in-hand manipulation [J].
Andrychowicz, Marcin ;
Baker, Bowen ;
Chociej, Maciek ;
Jozefowicz, Rafal ;
McGrew, Bob ;
Pachocki, Jakub ;
Petron, Arthur ;
Plappert, Matthias ;
Powell, Glenn ;
Ray, Alex ;
Schneider, Jonas ;
Sidor, Szymon ;
Tobin, Josh ;
Welinder, Peter ;
Weng, Lilian ;
Zaremba, Wojciech .
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2020, 39 (01) :3-20
[3]  
Angluin D., 1988, Machine Learning, V2, P319, DOI 10.1023/A:1022821128753
[4]  
[Anonymous], 2016, CoRR abs/1606.01540
[5]   ModAugNet: A new forecasting framework for stock market index value with an overfitting prevention LSTM module and a prediction LSTM module [J].
Baek, Yujin ;
Kim, Ha Young .
EXPERT SYSTEMS WITH APPLICATIONS, 2018, 113 :457-480
[6]   A deep learning framework for financial time series using stacked autoencoders and long-short term memory [J].
Bao, Wei ;
Yue, Jun ;
Rao, Yulei .
PLOS ONE, 2017, 12 (07)
[7]  
Beattie Charles, 2016, Deepmind lab
[8]   A MARKOVIAN DECISION PROCESS [J].
BELLMAN, R .
JOURNAL OF MATHEMATICS AND MECHANICS, 1957, 6 (05) :679-684
[9]   Automated trading with performance weighted random forests and seasonality [J].
Booth, Ash ;
Gerding, Enrico ;
McGroarty, Frank .
EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (08) :3651-3661
[10]  
Bottou L., 1998, ON LINE LEARNING NEU, V17, P142