Real-time estimation of disease activity in emerging outbreaks using internet search information

被引:13
作者
Aiken, Emily L. [1 ]
McGough, Sarah F. [2 ]
Majumder, Maimuna S. [3 ]
Wachtel, Gal [4 ]
Nguyen, Andre T. [5 ,6 ]
Viboud, Cecile [7 ]
Santillana, Mauricio [1 ,4 ,8 ]
机构
[1] Harvard Univ, Sch Engn & Appl Sci, Cambridge, MA 02138 USA
[2] Harvard TH Chan Sch Publ Hlth, Boston, MA USA
[3] Harvard Med Sch, Dept Healthcare Policy, Boston, MA 02115 USA
[4] Boston Childrens Hosp, Computat Hlth Informat Program, Boston, MA 02115 USA
[5] Booz Allen Hamilton, Columbia, MD USA
[6] Univ Maryland, Baltimore, MD 21201 USA
[7] NIH, Fogarty Int Ctr, Bldg 10, Bethesda, MD 20892 USA
[8] Harvard Med Sch, Dept Pediat, Boston, MA 02115 USA
基金
美国国家卫生研究院;
关键词
FLU;
D O I
10.1371/journal.pcbi.1008117
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Understanding the behavior of emerging disease outbreaks in, or ahead of, real-time could help healthcare officials better design interventions to mitigate impacts on affected populations. Most healthcare-based disease surveillance systems, however, have significant inherent reporting delays due to data collection, aggregation, and distribution processes. Recent work has shown that machine learning methods leveraging a combination of traditionally collected epidemiological information and novel Internet-based data sources, such as disease-related Internet search activity, can produce meaningful "nowcasts" of disease incidence ahead of healthcare-based estimates, with most successful case studies focusing on endemic and seasonal diseases such as influenza and dengue. Here, we apply similar computational methods to emerging outbreaks in geographic regions where no historical presence of the disease of interest has been observed. By combining limited available historical epidemiological data available with disease-related Internet search activity, we retrospectively estimate disease activity in five recent outbreaks weeks ahead of traditional surveillance methods. We find that the proposed computational methods frequently provide useful real-time incidence estimates that can help fill temporal data gaps resulting from surveillance reporting delays. However, the proposed methods are limited by issues of sample bias and skew in search query volumes, perhaps as a result of media coverage. Author summary Public health officials regularly make choices about treatment and prevention in disease outbreaks that have the potential to impact entire affected populations. Often these decisions are based on incomplete or unreliable information due to inherent reporting delays in healthcare-based disease surveillance systems. This issue of public health decision-making based on limited data is even more salient in emerging outbreaks, which are typically characterized by uncertain disease dynamics and limited surveillance capacity. We demonstrate the potential for using digital trace data-in this case, Internet-based information from Google search trends-for estimating disease activity in emerging outbreaks in the absence of accurate real-time healthcare-based data sources. We evaluate how data-driven methods leveraging search trend data would have performed in real-time in five recent outbreaks (yellow fever in Angola, Zika in Colombia, Ebola in the DRC, plague in Madagascar, and cholera in Yemen), and find that the methods frequently provide useful signals of disease activity ahead of standard healthcare-based surveillance methods.
引用
收藏
页数:19
相关论文
共 30 条
[1]  
[Anonymous], 2016, J LARGE SCALE RES FA, V2, pA49
[2]  
[Anonymous], 2019, CANC DISCOV, V10, pOF5
[3]  
[Anonymous], 2019, WHO REGIONAL OFFICE
[4]  
[Anonymous], 2018, DIABETES CARE S1, V4, pS4, DOI 10.2337/dc18-Srev01
[5]  
[Anonymous], Ebola in the Democratic Republic of the Congo: North Kivu, Ituri 2018-2020
[6]  
[Anonymous], 2011, J NATL COMPR CANC NE
[7]  
[Anonymous], 2016, CANC DISCOV
[8]   Flexible Modeling of Epidemics with an Empirical Bayes Framework [J].
Brooks, Logan C. ;
Farrow, David C. ;
Hyun, Sangwon ;
Tibshirani, Ryan J. ;
Rosenfeld, Roni .
PLOS COMPUTATIONAL BIOLOGY, 2015, 11 (08)
[9]   Using Web Search Query Data to Monitor Dengue Epidemics: A New Model for Neglected Tropical Disease Surveillance [J].
Chan, Emily H. ;
Sahai, Vikram ;
Conrad, Corrie ;
Brownstein, John S. .
PLOS NEGLECTED TROPICAL DISEASES, 2011, 5 (05)
[10]   Social and News Media Enable Estimation of Epidemiological Patterns Early in the 2010 Haitian Cholera Outbreak [J].
Chunara, Rumi ;
Andrews, Jason R. ;
Brownstein, John S. .
AMERICAN JOURNAL OF TROPICAL MEDICINE AND HYGIENE, 2012, 86 (01) :39-45