Mapping the plague through natural language processing

被引:3
作者
Krauer, Fabienne [1 ]
Schmid, Boris V. [1 ]
机构
[1] Univ Oslo, Ctr Ecol & Evolutionary Synth, Dept Biosci, N-0316 Oslo, Norway
关键词
Plague; Infectious diseases; Historical epidemiology; Outbreaks; Natural language processing; Machine learning;
D O I
10.1016/j.epidem.2022.100656
中图分类号
R51 [传染病];
学科分类号
100401 ;
摘要
Pandemic diseases such as plague have produced a vast amount of literature providing information about the spatiotemporal extent, transmission, or countermeasures. However, the manual extraction of such information from running text is a tedious process, and much of this information remains locked into a narrative format. Natural Language processing (NLP) is a promising tool for the automated extraction of epidemiological data, and can facilitate the establishment of datasets. In this paper, we explore the utility of NLP to assist in the creation of a plague outbreak dataset. We produced a gold standard list of toponyms by manual annotation of a German plague treatise published by Sticker in 1908. We investigated the performance of five pre-trained NLP libraries (Google, Stanford CoreNLP, spaCy, germaNER and Geoparser) for the automated extraction of location data compared to the gold standard. Of all tested algorithms, spaCy performed best (sensitivity 0.92, F1 score 0.83), followed closely by Stanford CoreNLP (sensitivity 0.81, F1 score 0.87). Google NLP had a slightly lower per-formance (F1 score 0.72, sensitivity 0.78). Geoparser and germaNER had a poor sensitivity (0.41 and 0.61). We then evaluated how well automated geocoding services such as Google geocoding, Geonames and Geoparser located these outbreaks correctly. All geocoding services performed poorly - particularly for historical regions - and returned the correct GIS information only in 60.4%, 52.7% and 33.8% of all cases. Finally, we compared our newly digitized plague dataset to a re-digitized version of the plague treatise by Biraben and provide an update of the spatio-temporal extent of the second pandemic plague outbreaks. We conclude that NLP tools have their limitations, but they are potentially useful to accelerate the collection of data and the generation of a global plague outbreak database.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Intent Expression Through Natural Language Processing in an Enterprise Network
    El-Rif, Elie
    Leivadeas, Aris
    Falkner, Matthias
    2023 IEEE 24TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE SWITCHING AND ROUTING, HPSR, 2023,
  • [22] Mapping Natural Language Intents to User Interfaces through Vision-Language Models
    Abukadah, Halima
    Fereidouni, Moghis
    Siddique, A. B.
    18TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, ICSC 2024, 2024, : 237 - 244
  • [23] Identifying individual expectations in service recovery through natural language processing and machine learning
    Liu, Yijiang
    Wan, Yinghong
    Su, Xiao
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 131 : 288 - 298
  • [24] Mapping biomedical terminologies using natural language processing tools and UMLS: Mapping the Orphanet thesaurus to the MeSH
    Merabti, T.
    Joubert, M.
    Lecroq, T.
    Rath, A.
    Darmoni, S. J.
    IRBM, 2010, 31 (04) : 221 - 225
  • [25] Natural language processing for Nepali text: a review
    Tej Bahadur Shahi
    Chiranjibi Sitaula
    Artificial Intelligence Review, 2022, 55 : 3401 - 3429
  • [26] Natural Language Processing for Associative Word Predictions
    Grujic, Nebojsa D.
    Milovanovic, Vladimir M.
    PROCEEDINGS OF 18TH INTERNATIONAL CONFERENCE ON SMART TECHNOLOGIES (IEEE EUROCON 2019), 2019,
  • [27] Natural language processing for Nepali text: a review
    Shahi, Tej Bahadur
    Sitaula, Chiranjibi
    ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (04) : 3401 - 3429
  • [28] Using Natural Language Processing for Phishing Detection
    Jonker, Richard Adolph Aires
    Poudel, Roshan
    Pedrosa, Tiago
    Lopes, Rui Pedro
    OPTIMIZATION, LEARNING ALGORITHMS AND APPLICATIONS, OL2A 2021, 2021, 1488 : 540 - 552
  • [29] Data augmentation techniques in natural language processing
    Pellicer, Lucas Francisco Amaral Orosco
    Ferreira, Taynan Maier
    Costa, Anna Helena Reali
    APPLIED SOFT COMPUTING, 2023, 132
  • [30] Text mining and natural language processing in construction
    Shamshiri, Alireza
    Ryu, Kyeong Rok
    Park, June Young
    AUTOMATION IN CONSTRUCTION, 2024, 158