Mapping the plague through natural language processing

被引:3
|
作者
Krauer, Fabienne [1 ]
Schmid, Boris V. [1 ]
机构
[1] Univ Oslo, Ctr Ecol & Evolutionary Synth, Dept Biosci, N-0316 Oslo, Norway
关键词
Plague; Infectious diseases; Historical epidemiology; Outbreaks; Natural language processing; Machine learning;
D O I
10.1016/j.epidem.2022.100656
中图分类号
R51 [传染病];
学科分类号
100401 ;
摘要
Pandemic diseases such as plague have produced a vast amount of literature providing information about the spatiotemporal extent, transmission, or countermeasures. However, the manual extraction of such information from running text is a tedious process, and much of this information remains locked into a narrative format. Natural Language processing (NLP) is a promising tool for the automated extraction of epidemiological data, and can facilitate the establishment of datasets. In this paper, we explore the utility of NLP to assist in the creation of a plague outbreak dataset. We produced a gold standard list of toponyms by manual annotation of a German plague treatise published by Sticker in 1908. We investigated the performance of five pre-trained NLP libraries (Google, Stanford CoreNLP, spaCy, germaNER and Geoparser) for the automated extraction of location data compared to the gold standard. Of all tested algorithms, spaCy performed best (sensitivity 0.92, F1 score 0.83), followed closely by Stanford CoreNLP (sensitivity 0.81, F1 score 0.87). Google NLP had a slightly lower per-formance (F1 score 0.72, sensitivity 0.78). Geoparser and germaNER had a poor sensitivity (0.41 and 0.61). We then evaluated how well automated geocoding services such as Google geocoding, Geonames and Geoparser located these outbreaks correctly. All geocoding services performed poorly - particularly for historical regions - and returned the correct GIS information only in 60.4%, 52.7% and 33.8% of all cases. Finally, we compared our newly digitized plague dataset to a re-digitized version of the plague treatise by Biraben and provide an update of the spatio-temporal extent of the second pandemic plague outbreaks. We conclude that NLP tools have their limitations, but they are potentially useful to accelerate the collection of data and the generation of a global plague outbreak database.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Towards Personalized Educational Materials: Mapping Student Knowledge Through Natural Language Processing
    Domenichini, Diana
    Giordano, Vito
    Fantoni, Gualtiero
    Chiarello, Filippo
    MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2023, PT II, 2025, 2134 : 64 - 79
  • [2] Global Research on Natural Disasters and Human Health: a Mapping Study Using Natural Language Processing Techniques
    Ye, Xin
    Lin, Hugo
    CURRENT ENVIRONMENTAL HEALTH REPORTS, 2024, 11 (01) : 61 - 70
  • [3] Agile Development Methodologies and Natural Language Processing: A Mapping Review
    Quintana, Manuel A.
    Palacio, Ramon R.
    Borrego Soto, Gilberto
    Gonzalez-Lopez, Samuel
    COMPUTERS, 2022, 11 (12)
  • [4] A proposal for an approach to mapping susceptibility to landslides using natural language processing and machine learning
    Rodrigues, Saulo Guilherme
    Silva, Maisa Mendonca
    Alencar, Marcelo Hazin
    LANDSLIDES, 2021, 18 (07) : 2515 - 2529
  • [5] A proposal for an approach to mapping susceptibility to landslides using natural language processing and machine learning
    Saulo Guilherme Rodrigues
    Maisa Mendonça Silva
    Marcelo Hazin Alencar
    Landslides, 2021, 18 : 2515 - 2529
  • [6] From NLP (Natural Language Processing) to MLP (Machine Language Processing)
    Teufl, Peter
    Payer, Udo
    Lackner, Guenter
    COMPUTER NETWORK SECURITY, 2010, 6258 : 256 - +
  • [7] Detection of Mild Cognitive Impairment Through Natural Language and Touchscreen Typing Processing
    Ntracha, Anastasia
    Iakovakis, Dimitrios
    Hadjidimitriou, Stelios
    Charisis, Vasileios S.
    Tsolaki, Magda
    Hadjileontiadis, Leontios J.
    FRONTIERS IN DIGITAL HEALTH, 2020, 2
  • [8] Manufacturing process encoding through natural language processing for prediction of material properties
    Costa, Ana P. O.
    Seabra, Mariana R. R.
    de Sa, Jose M. A. Cesar
    Santos, Abel D.
    COMPUTATIONAL MATERIALS SCIENCE, 2024, 237
  • [9] Automating the Assessment of Multicultural Orientation Through Machine Learning and Natural Language Processing
    Goldberg, Simon B.
    Tanana, Michael
    Stewart, Shaakira Haywood
    Williams, Camille Y.
    Soma, Christina S.
    Atkins, David C.
    Imel, Zac E.
    Owen, Jesse
    PSYCHOTHERAPY, 2024,
  • [10] Applications of natural language processing in software traceability: A systematic mapping study?
    Pauzi, Zaki
    Capiluppi, Andrea
    JOURNAL OF SYSTEMS AND SOFTWARE, 2023, 198