Combining bag-of-words and sentiment features of annual reports to predict abnormal stock returns

被引:47
作者
Hajek, Petr [1 ]
机构
[1] Univ Pardubice, Inst Syst Engn & Informat, Fac Econ & Adm, Studentska 84, Pardubice, Czech Republic
关键词
Stock return; Prediction; Text mining; Sentiment; Neural network; INFORMATION-CONTENT; FEATURE-SELECTION; MARKET PREDICTION; TEXTUAL ANALYSIS; NEWS; EARNINGS; MEDIA; PRICE; READABILITY; DROPOUT;
D O I
10.1007/s00521-017-3194-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automated textual analysis of firm-related documents has become an important decision support tool for stock market investors. Previous studies tended to adopt either dictionary-based or machine learning approach. Nevertheless, little is known about their concurrent use. Here we use the combination of financial indicators, readability, sentiment categories, and bag-of-words (BoW) to increase prediction accuracy. This paper aims to extract both sentiment and BoW information from the annual reports of US firms. The sentiment analysis is based on two commonly used dictionaries, namely a general dictionary Diction 7.0 and a finance-specific dictionary proposed by Loughran and McDonald (J Finance 66:35-65, 2011. doi:10.1111/j.1540-6261.2010.01625.x). The BoW are selected according to their tf-idf. We combine these features with financial indicators to predict abnormal stock returns using a multilayer perceptron neural network with dropout regularization and rectified linear units. We show that this method performs similarly as na < ve Bayes and outperforms other machine learning algorithms (support vector machine, C4.5 decision tree, and k-nearest neighbour classifier) in predicting positive/negative abnormal stock returns in terms of ROC. We also show that the quality of the prediction significantly increased when using the correlation-based feature selection of BoW. This prediction performance is robust to industry categorization and event window.
引用
收藏
页码:343 / 358
页数:16
相关论文
共 80 条
[1]   The Structure of Voluntary Disclosure Narratives: Evidence from Tone Dispersion [J].
Allee, Kristian D. ;
Deangelis, Matthew D. .
JOURNAL OF ACCOUNTING RESEARCH, 2015, 53 (02) :241-274
[2]  
[Anonymous], 2004, ACM SIGKDD Explor. Newsl.
[3]  
[Anonymous], 2012, Technical Report
[4]   Is all that talk just noise? The information content of Internet stock message boards [J].
Antweiler, W ;
Frank, MZ .
JOURNAL OF FINANCE, 2004, 59 (03) :1259-1294
[5]   On the predictive ability of narrative disclosures in annual reports [J].
Balakrishnan, Ramji ;
Qiu, Xin Ying ;
Srinivasan, Padmini .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2010, 202 (03) :789-801
[6]   Selective Dropout for Deep Neural Networks [J].
Barrow, Erik ;
Eastwood, Mark ;
Jayne, Chrisina .
NEURAL INFORMATION PROCESSING, ICONIP 2016, PT III, 2016, 9949 :519-528
[7]   Using 10-K Text to Gauge Financial Constraints [J].
Bodnaruk, Andriy ;
Loughran, Tim ;
McDonald, Bill .
JOURNAL OF FINANCIAL AND QUANTITATIVE ANALYSIS, 2015, 50 (04) :623-646
[8]  
Butler M, 2009, LECT NOTES COMPUT SC, V5549, P39, DOI 10.1007/978-3-642-01818-3_7
[9]  
Crain S.P., 2012, MINING TEXT DATA, P129, DOI [DOI 10.1007/978-1-4614-3223-4_52,4, DOI 10.1007/978-1-4614-3223-4_5]
[10]   Beyond the Numbers: Measuring the Information Content of Earnings Press Release Language [J].
Davis, Angela K. ;
Piger, Jeremy M. ;
Sedor, Lisa M. .
CONTEMPORARY ACCOUNTING RESEARCH, 2012, 29 (03) :845-+