Flu Gone Viral: Syndromic Surveillance of Flu on Twitter using Temporal Topic Models

被引:27
作者
Chen, Liangzhe [1 ]
Hossain, K. S. M. Tozammel [1 ]
Butler, Patrick [1 ]
Ramakrishnan, Naren [1 ]
Prakash, B. Aditya [1 ]
机构
[1] Virginia Tech, Dept Comp Sci, Blacksburg, VA 24061 USA
来源
2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM) | 2014年
关键词
GLOBAL STABILITY;
D O I
10.1109/ICDM.2014.137
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Surveillance of epidemic outbreaks and spread from social media is an important tool for governments and public health authorities. Machine learning techniques for nowcasting the flu have made significant inroads into correlating social media trends to case counts and prevalence of epidemics in a population. There is a disconnect between data-driven methods for forecasting flu incidence and epidemiological models that adopt a state based understanding of transitions, that can lead to sub-optimal predictions. Furthermore, models for epidemiological activity and social activity like on Twitter predict different shapes and have important differences. We propose a temporal topic model to capture hidden states of a user from his tweets and aggregate states in a geographical region for better estimation of trends. We show that our approach helps fill the gap between phenomenological methods for disease surveillance and epidemiological models. We validate this approach by modeling the flu using Twitter in multiple countries of South America. We demonstrate that our model can consistently outperform plain vocabulary assessment in flu case-count predictions, and at the same time get better flu-peak predictions than competitors. We also show that our fine-grained modeling can reconcile some contrasting behaviors between epidemiological and social models.
引用
收藏
页码:755 / 760
页数:6
相关论文
共 27 条
[1]  
Achrekar H., 2011, IEEE INFOCOM 2011 - IEEE Conference on Computer Communications. Workshops, P702, DOI 10.1109/INFCOMW.2011.5928903
[2]   The Hidden Markov Topic Model: A Probabilistic Model of Semantic Representation [J].
Andrews, Mark ;
Vigliocco, Gabriella .
TOPICS IN COGNITIVE SCIENCE, 2010, 2 (01) :101-113
[3]  
[Anonymous], 2012, 18 ACM SIGKDD INT C, DOI DOI 10.1145/2339530.2339537
[4]  
[Anonymous], 2011, P 20 INT C WORLD WID, DOI DOI 10.1145/1963405.1963503
[5]  
[Anonymous], 2007, P ART INT STAT
[6]  
[Anonymous], PNAS
[7]  
BERETTA E, 1995, J MATH BIOL, V33, P250, DOI 10.1007/BF00169563
[8]  
Blei D.M., 2006, P 23 INT C MACH LEAR, P113, DOI DOI 10.1145/1143844.1143859
[9]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[10]   When Google got flu wrong [J].
Butler, Declan .
NATURE, 2013, 494 (7436) :155-156