Action-specialized expert ensemble trading system with extended discrete action space using deep reinforcement learning

被引：0

作者：

Leem, JoonBum ^{[1
]}

Kim, Ha Young ^{[2
]}

机构：

[1] Ajou Univ, Dept Financial Engn, Suwon, South Korea

[2] Yonsei Univ, Grad Sch Informat, Seoul, South Korea

来源：

PLOS ONE | 2020年 / 15卷 / 07期

关键词：

PREDICTING STOCK; MACHINE; LIQUIDITY; FRAMEWORK; RETURNS;

D O I：

10.1371/journal.pone.0236178; 10.1371/journal.pone.0236178.r001; 10.1371/journal.pone.0236178.r002; 10.1371/journal.pone.0236178.r003; 10.1371/journal.pone.0236178.r004

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Despite active research on trading systems based on reinforcement learning, the development and performance of research methods require improvements. This study proposes a new action-specialized expert ensemble method consisting of action-specialized expert models designed specifically for each reinforcement learning action: buy, hold, and sell. Models are constructed by examining and defining different reward values that correlate with each action under specific conditions, and investment behavior is reflected with each expert model. To verify the performance of this technique, profits of the proposed system are compared to those of single trading and common ensemble systems. To verify robustness and account for the extension of discrete action space, we compared and analyzed changes in profits of the three actions to our model's results. Furthermore, we checked for sensitivity with three different reward functions: profit, Sharpe ratio, and Sortino ratio. All experiments were conducted with S&P500, Hang Seng Index, and Eurostoxx50 data. The model was 39.1% and 21.6% more efficient than single and common ensemble models, respectively. Considering the extended discrete action space, the 3-action space was extended to 11- and 21-action spaces, and the cumulative returns increased by 427.2% and 856.7%, respectively. Results on reward functions indicated that our models are well trained; results of the Sharpe and Sortino ratios were better than the implementation of profit only, as in the single-model cases. The Sortino ratio was slightly better than the Sharpe ratio.

引用

页数：39

共 58 条

[1] An adaptive portfolio trading system: A risk-return portfolio optimization using recurrent reinforcement learning with expected maximum drawdown [J].