Sorting Through the Safety Data Haystack: Using Machine Learning to Identify Individual Case Safety Reports in Social-Digital Media

被引:27
|
作者
Comfort, Shaun [1 ]
Perera, Sujan [2 ]
Hudson, Zoe [1 ]
Dorrell, Darren [1 ]
Meireis, Shawman [1 ]
Nagarajan, Meenakshi [2 ]
Ramakrishnan, Cartic [2 ]
Fine, Jennifer [1 ]
机构
[1] Genentech Inc, San Francisco, CA 94080 USA
[2] IBM Watson Hlth, Cambridge, MA USA
关键词
ADVERSE DRUG-REACTIONS; PHARMACOVIGILANCE; PERSPECTIVES; AGREEMENT; FACEBOOK; SIGNALS;
D O I
10.1007/s40264-018-0641-7
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
There is increasing interest in social digital media (SDM) as a data source for pharmacovigilance activities; however, SDM is considered a low information content data source for safety data. Given that pharmacovigilance itself operates in a high-noise, lower-validity environment without objective 'gold standards' beyond process definitions, the introduction of large volumes of SDM into the pharmacovigilance workflow has the potential to exacerbate issues with limited manual resources to perform adverse event identification and processing. Recent advances in medical informatics have resulted in methods for developing programs which can assist human experts in the detection of valid individual case safety reports (ICSRs) within SDM. In this study, we developed rule-based and machine learning (ML) models for classifying ICSRs from SDM and compared their performance with that of human pharmacovigilance experts. We used a random sampling from a collection of 311,189 SDM posts that mentioned Roche products and brands in combination with common medical and scientific terms sourced from Twitter, Tumblr, Facebook, and a spectrum of news media blogs to develop and evaluate three iterations of an automated ICSR classifier. The ICSR classifier models consisted of sub-components to annotate the relevant ICSR elements and a component to make the final decision on the validity of the ICSR. Agreement with human pharmacovigilance experts was chosen as the preferred performance metric and was evaluated by calculating the Gwet AC1 statistic (gKappa). The best performing model was tested against the Roche global pharmacovigilance expert using a blind dataset and put through a time test of the full 311,189-post dataset. During this effort, the initial strict rule-based approach to ICSR classification resulted in a model with an accuracy of 65% and a gKappa of 46%. Adding an ML-based adverse event annotator improved the accuracy to 74% and gKappa to 60%. This was further improved by the addition of an additional ML ICSR detector. On a blind test set of 2500 posts, the final model demonstrated a gKappa of 78% and an accuracy of 83%. In the time test, it took the final model 48 h to complete a task that would have taken an estimated 44,000 h for human experts to perform. The results of this study indicate that an effective and scalable solution to the challenge of ICSR detection in SDM includes a workflow using an automated ML classifier to identify likely ICSRs for further human SME review.
引用
收藏
页码:579 / 590
页数:12
相关论文
共 7 条
  • [1] Development of a multivariate prediction model to identify individual case safety reports which require clinical review
    Gosselt, Helen R.
    Bazelmans, Elizabeth A.
    Lieber, Thomas
    van Hunsel, Florence P. A. M.
    Harmark, Linda
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2022, 31 (12) : 1300 - 1307
  • [2] Digital Resilience Through Training Protocols: Learning To Identify Fake News On Social Media
    Soetekouw, Lisa
    Angelopoulos, Spyros
    INFORMATION SYSTEMS FRONTIERS, 2024, 26 (02) : 459 - 475
  • [3] Neuropsychological adverse drug reactions of Remdesivir: analysis using VigiBase, the WHO global database of individual case safety reports
    Lee, S.
    Yang, J. W.
    Jung, S. Y.
    Kim, M. S.
    Yon, D. K.
    Lee, S. W.
    Kang, H-C
    Dragioti, E.
    Tizaoui, K.
    Jacob, L.
    Koyanagi, A.
    Salem, J-E
    Kostev, K.
    Lascu, A.
    Shin, J., I
    Kim, J. H.
    Smith, L.
    EUROPEAN REVIEW FOR MEDICAL AND PHARMACOLOGICAL SCIENCES, 2021, 25 (23) : 7390 - 7397
  • [4] An unsupervised machine learning model for discovering latent infectious diseases using social media data
    Lim, Sunghoon
    Tucker, Conrad S.
    Kumara, Soundar
    JOURNAL OF BIOMEDICAL INFORMATICS, 2017, 66 : 82 - 94
  • [5] Safety profile of Dupilumab during pregnancy: a data mining and disproportionality analysis of over 37,000 reports from the WHO individual case safety reporting database (VigiBase™)
    Khamisy-Farah, R.
    Damiani, G.
    Kong, J. D.
    Wu, J.
    Bragazzi, N. L.
    EUROPEAN REVIEW FOR MEDICAL AND PHARMACOLOGICAL SCIENCES, 2021, 25 (17) : 5448 - 5451
  • [6] Risk of serious skin and subcutaneous tissue disorders for nimesulide among the pediatric population: a jeopardy identified through the analysis of global individual case safety reports
    Undela, Krishna
    Kalaiselvan, Vivekanandan
    Gudi, Sai Krishna
    Viswam, Subeesh K.
    Ali, Syed Kashif
    EXPERT OPINION ON DRUG SAFETY, 2024, 23 (08) : 1021 - 1026
  • [7] IL-4/13 Blockade and sleep-related adverse drug reactions in over 37,000 Dupilumab reports from the World Health Organization Individual Case Safety reporting pharmacovigilance database (VigiBase™): a big data and machine learning analysis
    Alroobaea, R.
    Rubaiee, S.
    Hanbazazah, A. S.
    Jahrami, H.
    Garbarino, S.
    Damiani, G.
    Wu, J.
    Bragazzi, N. L.
    EUROPEAN REVIEW FOR MEDICAL AND PHARMACOLOGICAL SCIENCES, 2022, 26 (11) : 4074 - 4081