Automatized spatio-temporal detection of drought impacts from newspaper articles using natural language processing and machine learning

被引:14
作者
Sodoge, Jan [1 ,2 ]
Kuhlicke, Christian [1 ,2 ]
de Brito, Mariana Madruga [1 ]
机构
[1] UFZ Helmholtz Ctr Environm Res, Dept Urban & Environm Sociol, D-04318 Leipzig, Germany
[2] Univ Potsdam, Inst Environm Sci & Geog, D-14476 Potsdam, Germany
来源
WEATHER AND CLIMATE EXTREMES | 2023年 / 41卷
关键词
Germany; Drought; NLP; Text mining; Machine learning; Natural hazards; Socio-economic impacts; Longitudinal study; BIG DATA; TEXT; EVENTS; FOREST;
D O I
10.1016/j.wace.2023.100574
中图分类号
P4 [大气科学(气象学)];
学科分类号
0706 ; 070601 ;
摘要
Droughts are expected to increase both in terms of frequency and magnitude across Europe. Despite the multitude of adverse effects these disasters impose on social-ecological systems, most impact assessments are constrained to single event and/or single sector analyses. Furthermore, existing longitudinal multi-sectoral datasets are limited in spatiotemporal homogeneity and scope, resulting in fragmented datasets. To address this gap, we propose a novel method for the automatized detection of drought impacts based on newspaper articles. We employ natural language processing (NLP) and machine learning to identify different socio-economic impacts (e.g. agriculture, forestry, livestock, fires) and their geographic and temporal scope from 40,000 newspaper articles reporting about droughts in Germany between 2000 and 2021. Our method is able to track impacts over long time periods, allowing us to assess how drought impacts evolve. Accuracy levels of 92-96% per impact class were obtained for the automatic classification of the impacts when evaluated on a human-annotated dataset. Furthermore, our resulting impact dataset can replicate both temporal and spatial trends when validated against independent impact and hazard data. Overall, the proposed approach advances current research as it (1) requires a significantly lower workload than conventional impact assessment methods, (2) allows addressing large text datasets, (3) reduces subjectivity and human bias, (4) is generalizable to other hazard types as well as text corpora, and (5) achieves sufficient levels of accuracy. The findings highlight the applicability of NLP and machine learning to create comprehensive longitudinal impact datasets.
引用
收藏
页数:9
相关论文
共 63 条
  • [1] A global overview of drought and heat-induced tree mortality reveals emerging climate change risks for forests
    Allen, Craig D.
    Macalady, Alison K.
    Chenchouni, Haroun
    Bachelet, Dominique
    McDowell, Nate
    Vennetier, Michel
    Kitzberger, Thomas
    Rigling, Andreas
    Breshears, David D.
    Hogg, E. H.
    Gonzalez, Patrick
    Fensham, Rod
    Zhang, Zhen
    Castro, Jorge
    Demidova, Natalia
    Lim, Jong-Hwan
    Allard, Gillian
    Running, Steven W.
    Semerci, Akkin
    Cobb, Neil
    [J]. FOREST ECOLOGY AND MANAGEMENT, 2010, 259 (04) : 660 - 684
  • [2] Angelov D, 2020, Arxiv, DOI [arXiv:2008.09470, DOI 10.48550/ARXIV.2008.09470]
  • [3] [Anonymous], 2005, SPARSE LOGISTIC REGR
  • [4] [Anonymous], 2022, Google Trends
  • [5] A Methodology for Automatic Acquisition of Flood-event Management Information From Social Media: the Flood in Messinia, South Greece, 2016
    Arapostathis, Stathis G.
    [J]. INFORMATION SYSTEMS FRONTIERS, 2021, 23 (05) : 1127 - 1144
  • [6] Drought impacts on the Amazon forest: the remote sensing perspective
    Asner, Gregory P.
    Alencar, Ane
    [J]. NEW PHYTOLOGIST, 2010, 187 (03) : 569 - 578
  • [7] CrisMap: a Big Data Crisis Mapping System Based on Damage Detection and Geoparsing
    Avvenuti, Marco
    Cresci, Stefano
    Del Vigna, Fabio
    Fagni, Tiziano
    Tesconi, Maurizio
    [J]. INFORMATION SYSTEMS FRONTIERS, 2018, 20 (05) : 993 - 1011
  • [8] Direct and seasonal legacy effects of the 2018 heat wave and drought on European ecosystem productivity
    Bastos, A.
    Ciais, P.
    Friedlingstein, P.
    Sitch, S.
    Pongratz, J.
    Fan, L.
    Wigneron, J. P.
    Weber, U.
    Reichstein, M.
    Fu, Z.
    Anthoni, P.
    Arneth, A.
    Haverd, V
    Jain, A. K.
    Joetzjer, E.
    Knauer, J.
    Lienert, S.
    Loughran, T.
    McGuire, P. C.
    Tian, H.
    Viovy, N.
    Zaehle, S.
    [J]. SCIENCE ADVANCES, 2020, 6 (24)
  • [9] Blei D.M., 2009, Text mining: Classification, clustering, and applications, P101
  • [10] Quantifying the Central European Droughts in 2018 and 2019 With GRACE Follow-On
    Boergens, Eva
    Guentner, Andreas
    Dobslaw, Henryk
    Dahle, Christoph
    [J]. GEOPHYSICAL RESEARCH LETTERS, 2020, 47 (14)