The Reliability of Tweets as a Supplementary Method of Seasonal Influenza Surveillance

被引:64
作者
Aslam, Anoshe A. [1 ]
Tsou, Ming-Hsiang [2 ]
Spitzberg, Brian H. [3 ]
An, Li [2 ]
Gawron, J. Mark [4 ]
Gupta, Dipak K. [5 ]
Peddecord, K. Michael [1 ]
Nagel, Anna C. [1 ]
Allen, Christopher [2 ]
Yang, Jiue-An [2 ]
Lindsay, Suzanne [1 ]
机构
[1] San Diego State Univ, Grad Sch Publ Hlth, San Diego, CA 92182 USA
[2] San Diego State Univ, Dept Geog, San Diego, CA 92115 USA
[3] San Diego State Univ, Sch Commun, San Diego, CA 92115 USA
[4] San Diego State Univ, Dept Linguist, San Diego, CA 92115 USA
[5] San Diego State Univ, Dept Polit Sci, San Diego, CA 92115 USA
基金
美国国家科学基金会;
关键词
Twitter; tweets; infoveillance; infodemiology; syndromic surveillance; influenza; Internet; SOCIAL MEDIA; TWITTER; WEB; US;
D O I
10.2196/jmir.3532
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Existing influenza surveillance in the United States is focused on the collection of data from sentinel physicians and hospitals; however, the compilation and distribution of reports are usually delayed by up to 2 weeks. With the popularity of social media growing, the Internet is a source for syndromic surveillance due to the availability of large amounts of data. In this study, tweets, or posts of 140 characters or less, from the website Twitter were collected and analyzed for their potential as surveillance for seasonal influenza. Objective: There were three aims: (1) to improve the correlation of tweets to sentinel-provided influenza-like illness (ILI) rates by city through filtering and a machine-learning classifier, (2) to observe correlations of tweets for emergency department ILI rates by city, and (3) to explore correlations for tweets to laboratory-confirmed influenza cases in San Diego. Methods: Tweets containing the keyword "flu" were collected within a 17-mile radius from 11 US cities selected for population and availability of ILI data. At the end of the collection period, 159,802 tweets were used for correlation analyses with sentinel-provided ILI and emergency department ILI rates as reported by the corresponding city or county health department. Two separate methods were used to observe correlations between tweets and ILI rates: filtering the tweets by type (non-retweets, retweets, tweets with a URL, tweets without a URL), and the use of a machine-learning classifier that determined whether a tweet was " valid", or from a user who was likely ill with the flu. Results: Correlations varied by city but general trends were observed. Non-retweets and tweets without a URL had higher and more significant (P<.05) correlations than retweets and tweets with a URL. Correlations of tweets to emergency department ILI rates were higher than the correlations observed for sentinel-provided ILI for most of the cities. The machine-learning classifier yielded the highest correlations for many of the cities when using the sentinel-provided or emergency department ILI as well as the number of laboratory-confirmed influenza cases in San Diego. High correlation values (r=.93) with significance at P<.001 were observed for laboratory-confirmed influenza cases for most categories and tweets determined to be valid by the classifier. Conclusions: Compared to tweet analyses in the previous influenza season, this study demonstrated increased accuracy in using Twitter as a supplementary surveillance tool for influenza as better filtering and classification methods yielded higher correlations for the 2013-2014 influenza season than those found for tweets in the previous influenza season, where emergency department ILI rates were better correlated to tweets than sentinel-provided ILI rates. Further investigations in the field would require expansion with regard to the location that the tweets are collected from, as well as the availability of more ILI data.
引用
收藏
页数:12
相关论文
共 25 条
  • [1] Achrekar H, 2011, P CPNS 2011 2011 1 I
  • [2] [Anonymous], 2009, P 2009 INT WORKSH LO, DOI DOI 10.1145/1629890.1629907
  • [3] Syndromic surveillance of Influenza-like illness in primary care: a complement to the sentinel surveillance network for periods of increased incidence of Influenza
    Arranz Izquierdo, J.
    Leiva Rus, A.
    Carandell Jaeger, E.
    Pujol Buades, A.
    Mendez Castell, M. C.
    Salva Fiol, A.
    Esteva Canto, M.
    [J]. ATENCION PRIMARIA, 2012, 44 (05): : 258 - 264
  • [4] Brownstein JS, 2009, NEW ENGL J MED, V360, P2153, DOI [10.1056/NEJMp0900702, 10.1056/NEJMp0904012]
  • [5] Buehler JW., 2004, Framework for Evaluating Public Health Surveillance Systems for Early Detection of Outbreaks: Recommendations from the CDC Working Group
  • [6] Centers for Disease Control and Prevention, 2013, WORKPL HLTH PROM AD
  • [7] Centers for Disease Control and Prevention, 2013, FLU SYMPT SEV
  • [8] Centers for Disease Control and Prevention, 2013, OV INFL SURV US
  • [9] Pandemics in the Age of Twitter: Content Analysis of Tweets during the 2009 H1N1 Outbreak
    Chew, Cynthia
    Eysenbach, Gunther
    [J]. PLOS ONE, 2010, 5 (11):
  • [10] Social and News Media Enable Estimation of Epidemiological Patterns Early in the 2010 Haitian Cholera Outbreak
    Chunara, Rumi
    Andrews, Jason R.
    Brownstein, John S.
    [J]. AMERICAN JOURNAL OF TROPICAL MEDICINE AND HYGIENE, 2012, 86 (01) : 39 - 45