Explainable Machine Learning Exploiting News and Domain-Specific Lexicon for Stock Market Forecasting

被引:34
作者
Carta, Salvatore M. [1 ]
Consoli, Sergio [2 ]
Piras, Luca [1 ]
Podda, Alessandro Sebastian [1 ]
Recupero, Diego Reforgiato [1 ]
机构
[1] Univ Cagliari, Dept Math & Comp Sci, I-09124 Cagliari, Italy
[2] European Commiss, Joint Res Ctr DG JRC, I-21027 Ispra, Italy
关键词
Forecasting; Social networking (online); Companies; Stock markets; Feature extraction; Task analysis; Prediction algorithms; Stock market forecasting; machine learning; natural language processing; financial technology; explainable artificial intelligence; PREDICTION; RETURN;
D O I
10.1109/ACCESS.2021.3059960
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this manuscript, we propose a Machine Learning approach to tackle a binary classification problem whose goal is to predict the magnitude (high or low) of future stock price variations for individual companies of the S&P 500 index. Sets of lexicons are generated from globally published articles with the goal of identifying the most impactful words on the market in a specific time interval and within a certain business sector. A feature engineering process is then performed out of the generated lexicons, and the obtained features are fed to a Decision Tree classifier. The predicted label (high or low) represents the underlying company's stock price variation on the next day, being either higher or lower than a certain threshold. The performance evaluation we have carried out through a walk-forward strategy, and against a set of solid baselines, shows that our approach clearly outperforms the competitors. Moreover, the devised Artificial Intelligence (AI) approach is explainable, in the sense that we analyze the white-box behind the classifier and provide a set of explanations on the obtained results.
引用
收藏
页码:30193 / 30205
页数:13
相关论文
共 52 条
[1]  
Adhikari Binay K., 2014, International Journal of Financial Markets and Derivatives, V3, P222, DOI 10.1504/IJFMD.2014.059637
[2]  
[Anonymous], 2007, P HUMAN LANGUAGE TEC
[3]  
[Anonymous], 1995, P 33 ANN M ASS COMP
[4]  
Atkins A., 2018, The Journal of Finance and Data Science, V4, P120, DOI [10.1016/j.jfds.2018.02.002, DOI 10.1016/J.JFDS.2018.02.002]
[5]   Using frame-based resources for sentiment analysis within the financial domain [J].
Atzeni, Mattia ;
Dridi, Amna ;
Recupero, Diego Reforgiato .
PROGRESS IN ARTIFICIAL INTELLIGENCE, 2018, 7 (04) :273-294
[6]   Personal Knowledge Graphs: A Research Agenda [J].
Balog, Krisztian ;
Kenter, Tom .
PROCEEDINGS OF THE 2019 ACM SIGIR INTERNATIONAL CONFERENCE ON THEORY OF INFORMATION RETRIEVAL (ICTIR'19), 2019, :216-219
[7]   Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI [J].
Barredo Arrieta, Alejandro ;
Diaz-Rodriguez, Natalia ;
Del Ser, Javier ;
Bennetot, Adrien ;
Tabik, Siham ;
Barbado, Alberto ;
Garcia, Salvador ;
Gil-Lopez, Sergio ;
Molina, Daniel ;
Benjamins, Richard ;
Chatila, Raja ;
Herrera, Francisco .
INFORMATION FUSION, 2020, 58 :82-115
[8]   Probabilistic Topic Models [J].
Blei, David M. .
COMMUNICATIONS OF THE ACM, 2012, 55 (04) :77-84
[9]   Information, Trading, and Volatility: Evidence from Firm-Specific News [J].
Boudoukh, Jacob ;
Feldman, Ronen ;
Kogan, Shimon ;
Richardson, Matthew .
REVIEW OF FINANCIAL STUDIES, 2019, 32 (03) :992-1033
[10]   An evaluation of volatility forecasting techniques [J].
Brailsford, TJ ;
Faff, RW .
JOURNAL OF BANKING & FINANCE, 1996, 20 (03) :419-438