Predicting Socio-Economic Indicators using News Events

被引:18
作者
Chakraborty, Sunandan [1 ,2 ]
Venkataraman, Ashwin [1 ]
Jagabathula, Srikanth [3 ]
Subramanian, Lakshminarayanan [1 ,2 ]
机构
[1] NYU Abu Dhabi, Dept Comp Sci, Abu Dhabi, U Arab Emirates
[2] NYU Abu Dhabi, Ctr Technol & Econ Dev, Abu Dhabi, U Arab Emirates
[3] NYU, Leonard N Stern Sch Business, New York, NY USA
来源
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING | 2016年
关键词
MEDIA;
D O I
10.1145/2939672.2939817
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many socio-economic indicators are sensitive to real-world events. Proper characterization of the events can help to identify the relevant events that drive fluctuations in these indicators. In this paper, we propose a novel generative model of real-world events and employ it to extract events from a large corpus of news articles. We introduce the notion of an event class, which is an abstract grouping of similarly themed events. These event classes are manifested in news articles in the form of event triggers which are specific words that describe the actions or incidents reported in any article. We use the extracted events to predict fluctuations in different socioeconomic indicators. Specifically, we focus on food prices and predict the price of 12 different crops based on real-world events that potentially influence food price volatility, such as transport strikes, festivals etc. Our experiments demonstrate that incorporating event information in the prediction tasks reduces the root mean square error (RMSE) of prediction by 22% compared to the standard ARIMA model. We also predict sudden increases in the food prices (i.e. spikes) using events as features, and achieve an average 5-10% increase in accuracy compared to baseline models, including an LDA topic-model based predictive model.
引用
收藏
页码:1455 / 1464
页数:10
相关论文
共 43 条
[1]  
Amodeo Giuseppe., 2011, Proceedings of the 20th ACM international conference on Information and knowledge management, P1981
[2]  
[Anonymous], 2013, SHORT PAPERS
[3]  
[Anonymous], 2013, P 6 ACM INT C WEB SE
[4]  
[Anonymous], 2008, Introduction to information retrieval
[5]  
[Anonymous], 2011, J COMPUT SCI-NETH, DOI DOI 10.1016/j.jocs.2010.12.007
[6]  
[Anonymous], 2006, Introduction to Time Series and Forecasting
[7]  
[Anonymous], 2001, PROC 18 INT C MACH L
[8]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[9]   FBLG: A Simple and Effective Approach for Temporal Dependence Discovery from Time Series Data [J].
Cheng, Dehua ;
Bahadori, Mohammad Taha ;
Liu, Yan .
PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, :382-391
[10]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411