The R Package sentometrics to Compute, Aggregate, and Predict with Textual Sentiment

被引:9
作者
Ardia, David [1 ,2 ]
Bluteau, Keven [3 ]
Borms, Samuel [4 ,5 ]
Boudt, Kris [5 ,6 ,7 ]
机构
[1] HEC Montreal, Dept Decis Sci, Montreal, PQ, Canada
[2] Gerad, Montreal, PQ, Canada
[3] Univ Sherbrooke, Dept Finance, Sherbrooke, PQ, Canada
[4] Univ Neuchatel, Inst Financial Anal, Neuchatel, Switzerland
[5] Vrije Univ Brussel, Solvay Business Sch, Ixelles, Belgium
[6] Univ Ghent, Dept Econ, Ghent, Belgium
[7] Vrije Univ Amsterdam, Sch Business & Econ, Amsterdam, Netherlands
来源
JOURNAL OF STATISTICAL SOFTWARE | 2021年 / 99卷 / 02期
基金
瑞士国家科学基金会;
关键词
aggregation; penalized regression; prediction; R; sentometrics; textual sentiment; time series; NEWS; REGULARIZATION; REGRESSION; SELECTION; MODELS; LASSO;
D O I
10.18637/jss.v099.i02
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We provide a hands-on introduction to optimized textual sentiment indexation using the R package sentometrics. Textual sentiment analysis is increasingly used to unlock the potential information value of textual data. The sentometrics package implements an intuitive framework to efficiently compute sentiment scores of numerous texts, to aggregate the scores into multiple time series, and to use these time series to predict other variables. The workflow of the package is illustrated with a built-in corpus of news articles from two major U.S. journals to forecast the CBOE Volatility Index.
引用
收藏
页码:1 / 40
页数:40
相关论文
共 58 条
  • [1] ECONOMETRICS MEETS SENTIMENT: AN OVERVIEW OF METHODOLOGY AND APPLICATIONS
    Algaba, Andres
    Ardia, David
    Bluteau, Keven
    Borms, Samuel
    Boudt, Kris
    [J]. JOURNAL OF ECONOMIC SURVEYS, 2020, 34 (03) : 512 - 547
  • [2] Allaire J. J., 2021, RCPPPARALLEL PARALLE
  • [3] [Anonymous], 2008, J BUS COMMUN, DOI [10.1177/0021943608319388, DOI 10.1177/0021943608319388]
  • [4] [Anonymous], 2010, P HUM LANG TECHN ANN
  • [5] [Anonymous], 2011, Proceedings of the 2011 conference on empirical methods in natural language processing, DOI 10.18653/v1/d16-1202
  • [6] Is all that talk just noise? The information content of Internet stock message boards
    Antweiler, W
    Frank, MZ
    [J]. JOURNAL OF FINANCE, 2004, 59 (03) : 1259 - 1294
  • [7] Questioning the news about economic growth: Sparse forecasting using thousands of news-based sentiment values
    Ardia, David
    Bluteau, Keven
    Boudt, Kris
    [J]. INTERNATIONAL JOURNAL OF FORECASTING, 2019, 35 (04) : 1370 - 1386
  • [8] A Tidy Data Model for Natural Language Processing using cleanNLP
    Arnold, Taylor
    [J]. R JOURNAL, 2017, 9 (02): : 248 - 267
  • [9] Auguie Baptiste, 2017, CRAN
  • [10] Baccianella S, 2010, LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION