Analysis of longitudinal social media for monitoring symptoms during a pandemic

被引:0
|
作者
Lin, Shixu [1 ]
Garay, Lucas [1 ]
Hua, Yining [2 ,3 ,4 ]
Guo, Zhijiang [5 ]
Li, Wanxin [1 ]
Li, Minghui [1 ]
Zhang, Yujie [1 ]
Xu, Xiaolin [1 ]
Yang, Jie [1 ,6 ,7 ]
机构
[1] Zhejiang Univ, Sch Med, Sch Publ Hlth, Hangzhou 310058, Peoples R China
[2] Harvard TH Chan Sch Publ Hlth, Dept Epidemiol, Boston, MA 02115 USA
[3] Harvard Med Sch, Dept Biomed Informat, Boston, MA 02115 USA
[4] Brigham & Womens Hosp, Div Gen Internal Med, Boston, MA 02115 USA
[5] Univ Cambridge, Dept Comp Sci & Technol, Cambridge, England
[6] Brigham & Womens Hosp, Dept Med, Boston, MA 02115 USA
[7] Harvard Med Sch, Boston, MA 02115 USA
关键词
Natural language processing; Deep learning; Social media; Public health; COVID-19; Symptom surveillance; ASSOCIATION; INFECTION;
D O I
10.1016/j.jbi.2025.104778
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Objective: Current studies leveraging social media data for disease monitoring face challenges like noisy colloquial language and insufficient tracking of user disease progression in longitudinal data settings. This study aims to develop a pipeline for collecting, cleaning, and analyzing large-scale longitudinal social media data for disease monitoring, with a focus on COVID-19 pandemic. Materials and methods: This pipeline initiates by screening COVID-19 cases from tweets spanning February 1, 2020, to April 30, 2022. Longitudinal data is collected for each patient, two months before and three months after self-reporting. Symptoms are extracted using Name Entity Recognition (NER), followed by denoising with a combination of Graph Convolutional Network (GCN) and Bidirectional Encoder Representations from Transformers (BERT) model to retain only User-experienced Symptom Mentions (USM). Subsequently, symptoms are mapped to standardized medical concepts using the Unified Medical Language System (UMLS). Finally, this study conducts symptom pattern analysis and visualization to illustrate temporal changes in symptom prevalence and co-occurrence. Results: This study identified 191,096 self-reported COVID-19-positive cases from COVID-19-related tweets and retrospectively collected 811,398,280 historical tweets, of which 2,120,964 contained symptoms information. After denoising, 39 % (832,287) of symptom-sharing tweets reflected user-experienced mentions. The trained USM model achieved an average F1 score of 0.927. Further analysis revealed a higher prevalence of upper respiratory tract symptoms during the Omicron period compared to the Delta and Wild-type periods. Additionally, there was a pronounced co-occurrence of lower respiratory tract and nervous system symptoms in the Wild-type strain and Delta variant. Conclusion: This study established a robust framework for analyzing longitudinal social media data to monitor symptoms during a pandemic. By integrating denoising of user-experienced symptom mentions, our findings reveal the duration of different symptoms over time and by variant within a cohort of nearly 200,000 patients, providing critical insights into symptom trends that are often difficult to capture through traditional data source.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Women's longitudinal social media behaviors and experiences during a global pandemic
    Vaterlaus, J. Mitchell
    Spruance, Lori A.
    Patten, Emily V.
    SOCIAL SCIENCE JOURNAL, 2023,
  • [2] Social Media and Students' Wellbeing: An Empirical Analysis during the Covid-19 Pandemic
    Tkacova, Hedviga
    Pavlikova, Martina
    Jenisova, Zita
    Maturkanic, Patrik
    Kralik, Roman
    SUSTAINABILITY, 2021, 13 (18)
  • [3] Social Media and Research Publication Activity During Early Stages of the COVID-19 Pandemic: Longitudinal Trend Analysis
    Taneja, Sonia L.
    Passi, Monica
    Bhattacharya, Sumona
    Schueler, Samuel A.
    Gurram, Sandeep
    Koh, Christopher
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2021, 23 (06)
  • [4] Social Media Insights Into US Mental Health During the COVID-19 Pandemic: Longitudinal Analysis of Twitter Data
    Valdez, Danny
    ten Thij, Marijn
    Bathina, Krishna
    Rutter, Lauren A.
    Bollen, Johan
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2020, 22 (12)
  • [5] Public health perinatal promotion during COVID-19 pandemic: a social media analysis
    Toluwanimi D. Durowaye
    Alexandra R. Rice
    Anne T. M. Konkle
    Karen P. Phillips
    BMC Public Health, 22
  • [6] Examination of Psychiatric Symptoms Caused by Exposure to Social Media During the COVID-19 Pandemic
    Eren, Hulya Kok
    Sagar, Mehmet Enes
    PSYCHIATRY AND BEHAVIORAL SCIENCES, 2022, 12 (03): : 114 - 124
  • [7] Public health perinatal promotion during COVID-19 pandemic: a social media analysis
    Durowaye, Toluwanimi D.
    Rice, Alexandra R.
    Konkle, Anne T. M.
    Phillips, Karen P.
    BMC PUBLIC HEALTH, 2022, 22 (01)
  • [8] INTERNETADDICTION AND INTERACTION OF SOCIAL MEDIA DURING THE PANDEMIC
    Petrella, Simone
    Morais, Ricardo
    Silveira, Patricia
    REVISTA DE CIENCIAS HUMANAS DA UNIVERSIDADE DE TAUBATE, 2022, 15 (01):
  • [9] Tracking Self-reported Symptoms and Medical Conditions on Social Media During the COVID-19 Pandemic: Infodemiological Study
    Ding, Qinglan
    Massey, Daisy
    Huang, Chenxi
    Grady, Connor B.
    Lu, Yuan
    Cohen, Alina
    Matzner, Pini
    Mahajan, Shiwani
    Caraballo, Cesar
    Kumar, Navin
    Xue, Yuchen
    Dreyer, Rachel
    Roy, Brita
    Krumholz, Harlan M.
    JMIR PUBLIC HEALTH AND SURVEILLANCE, 2021, 7 (09):
  • [10] The Korean Wave during the coronavirus pandemic: an analysis of social media activities in Indonesia
    Aritenang, Adiwan Fahlan
    Drianda, Riela Provi
    Kesuma, Meyriana
    Ayu, Nadia
    WORLD LEISURE JOURNAL, 2024, 66 (03) : 346 - 362