Twitter-Based Influenza Detection After Flu Peak via Tweets With Indirect Information: Text Mining Study

被引:56
作者
Wakamiya, Shoko [1 ]
Kawai, Yukiko [2 ,3 ]
Aramaki, Eiji [1 ]
机构
[1] Nara Inst Sci & Technol, 8916-5 Takayama Cho, Ikoma 6300192, Japan
[2] Kyoto Sangyo Univ, Kyoto, Japan
[3] Osaka Univ, Osaka, Japan
关键词
influenza surveillance; location mention; Twitter; social network; spatial analysis; internet; microblog; infodemiology; infoveillance; SENTIMENT; SPREAD; RUMOR;
D O I
10.2196/publichealth.8627
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Background: The recent rise in popularity and scale of social networking services (SNSs) has resulted in an increasing need for SNS-based information extraction systems. A popular application of SNS data is health surveillance for predicting an outbreak of epidemics by detecting diseases from text messages posted on SNS platforms. Such applications share the following logic: they incorporate SNS users as social sensors. These social sensor-based approaches also share a common problem: SNS-based surveillance are much more reliable if sufficient numbers of users are active, and small or inactive populations produce inconsistent results. Objective: This study proposes a novel approach to estimate the trend of patient numbers using indirect information covering both urban areas and rural areas within the posts. Methods: We presented a TRAP model by embedding both direct information and indirect information. A collection of tweets spanning 3 years (7 million influenza-related tweets in Japanese) was used to evaluate the model. Both direct information and indirect information that mention other places were used. As indirect information is less reliable (too noisy or too old) than direct information, the indirect information data were not used directly and were considered as inhibiting direct information. For example, when indirect information appeared often, it was considered as signifying that everyone already had a known disease, leading to a small amount of direct information. Results: The estimation performance of our approach was evaluated using the correlation coefficient between the number of influenza cases as the gold standard values and the estimated values by the proposed models. The results revealed that the baseline model (BASELINE+NLP) shows .36 and that the proposed model (TRAP+NLP) improved the accuracy (.70, +.34 points). Conclusions: The proposed approach by which the indirect information inhibits direct information exhibited improved estimation performance not only in rural cities but also in urban cities, which demonstrated the effectiveness of the proposed method consisting of a TRAP model and natural language processing (NLP) classification.
引用
收藏
页码:41 / 55
页数:15
相关论文
共 48 条
[1]  
Achrekar Harshavardhan, 2012, Proceedings of the International Conference on Health Informatics. HEALTHINF 2012, P61
[2]  
Adams B., 2012, ICWSM, P375, DOI DOI 10.1094/PDIS-11-11-0999-PDN
[3]  
[Anonymous], 2011, Proceedings of the conference on empirical methods in natural language processing
[4]  
[Anonymous], 2010, P 19 INT C WORLD WID, DOI DOI 10.1145/1772690.1772698
[5]   Portraying Collective Spatial Attention in Twitter [J].
Antoine, Emilien ;
Jatowt, Adam ;
Wakamiya, Shoko ;
Kawai, Yukiko ;
Akiyama, Toyokazu .
KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, :39-48
[6]   National and Local Influenza Surveillance through Twitter: An Analysis of the 2012-2013 Influenza Epidemic [J].
Broniatowski, David A. ;
Paul, Michael J. ;
Dredze, Mark .
PLOS ONE, 2013, 8 (12)
[7]  
Chandra S., 2011, Proceedings of the 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and IEEE Third International Conference on Social Computing (PASSAT/SocialCom 2011), P838, DOI 10.1109/PASSAT/SocialCom.2011.120
[8]   @Phillies Tweeting from Philly? Predicting Twitter User Locations with Spatial Word Usage [J].
Chang, Hau-Wen ;
Lee, Dongwon ;
Eltaher, Mohammed ;
Lee, Jeongkyu .
2012 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), 2012, :111-118
[9]  
Charron L, 2015, PLOS ONE, V10, DOI [10.1371/journal.pone.0125154, 10.1371/journal.pone.0139701]
[10]   Social and News Media Enable Estimation of Epidemiological Patterns Early in the 2010 Haitian Cholera Outbreak [J].
Chunara, Rumi ;
Andrews, Jason R. ;
Brownstein, John S. .
AMERICAN JOURNAL OF TROPICAL MEDICINE AND HYGIENE, 2012, 86 (01) :39-45