FF-BERT: A BERT-based ensemble for automated classification of web-based text on flash flood events

被引:11
|
作者
Wilkho, Rohan Singh [1 ]
Chang, Shi [2 ]
Gharaibeh, Nasir G. [1 ]
机构
[1] Texas A&M Univ, Zachry Dept Civil & Environm Engn, College Stn, TX 77840 USA
[2] Trimble Inc, Westminster, CO 80021 USA
关键词
Flash flood; Text classification; Multi-label text classification; BERT;
D O I
10.1016/j.aei.2023.102293
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The web is a rich information repository that can be mined to uncover additional data about past flash flood (FF) events, currently missing from existing structured databases. However, this information originates from multiple sources (news articles, government records, and weather records among others) and may cover several topics. Furthermore, these topics may be disproportionately covered on the web. The large size and heterogenous nature of web information render manual review difficult. To address this challenge, we have developed a multi-label text classification model, FF-BERT. FF-BERT is designed to classify FF-related web paragraphs into one or more of seven categories: (1) Damage and Economic Impact (DI), (2) Fatalities, Injuries, and Rescue (FIR), (3) Hydrometeorology (HM), (4) Warning and Emergency (WE), (5) Response and Recovery (RR), (6) Public Health (PH), and (7) Mitigation (MG). To develop FF-BERT, we labeled 21,180 paragraphs from FF-related webpages and performed experiments with multiple model architectures based on the widely used language model Bidirectional Encoder Representation from Transformers (BERT). Our final model outperforms the baseline by 11.83%, as measured by the micro-F1 score. In addition, FF-BERT significantly improves the prediction of minority labels (RR-32.1%, PH-260.4%, and MG-138.6%). We demonstrate using real world examples that FF-BERT can be used to uncover new information about flash flood events. This information can be used to enhance existing databases, such as NOAA's Storm Events Database.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] A BERT-Based Framework for Automated Extraction of Behavioral Indicators of Compromise from Security Incident Reports
    Bekhouche, Mohamed El Amine
    Adi, Kamel
    FOUNDATIONS AND PRACTICE OF SECURITY, PT I, FPS 2023, 2024, 14551 : 219 - 232
  • [32] A BERT-Based Artificial Intelligence to Analyze Free-Text Clinical Notes for Binary Classification in Papillary Thyroid Carcinoma Recurrence
    Nam, Jahyun
    Choi, Jee-Woo
    Shin, Yong-Goo
    Park, Seung
    2023 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS, ICCE, 2023,
  • [33] Assessing the alignment of corporate ESG disclosures with the UN sustainable development goals: a BERT-based text analysis
    Kim, Hyogon
    Lee, Eunmi
    Yoo, Donghee
    DATA TECHNOLOGIES AND APPLICATIONS, 2025, 59 (01) : 19 - 40
  • [34] Research on Internet Text Sentiment Classification Based on BERT and CNN-BiGRU
    Wei, Guoli
    2022 11TH INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS (ICCCAS 2022), 2022, : 285 - 289
  • [35] Financial causal sentence recognition based on BERT-CNN text classification
    Chang-Xuan Wan
    Bo Li
    The Journal of Supercomputing, 2022, 78 : 6503 - 6527
  • [36] Financial causal sentence recognition based on BERT-CNN text classification
    Wan, Chang-Xuan
    Li, Bo
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (05) : 6503 - 6527
  • [37] Research on News Text Classification Based on BERT-BiLSTM-TextCNN-Attention
    Wang, Jia
    Li, Zongting
    Ma, Chenyang
    PROCEEDINGS OF 2024 3RD INTERNATIONAL CONFERENCE ON CYBER SECURITY, ARTIFICIAL INTELLIGENCE AND DIGITAL ECONOMY, CSAIDE 2024, 2024, : 295 - 298
  • [38] Text classification for distribution substation inspection based on BERT-TextRCNN model
    Lu, Jiangang
    Zhao, Ruifeng
    Yu, Zhiwen
    Dai, Yue
    Shu, Jiawei
    Yang, Ting
    FRONTIERS IN ENERGY RESEARCH, 2024, 12
  • [39] Text Classification Model Based on BERT-Capsule with Integrated Deep Learning
    Tian, Yuwei
    Zhang, Zhi
    PROCEEDINGS OF THE 2021 IEEE 16TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2021), 2021, : 106 - 111
  • [40] Legal Textual Entailment Using Ensemble of Rule-Based and BERT-Based Method with Data Augmentation by Related Article Generation
    Fujita, Masaki
    Onaga, Takaaki
    Ueyama, Ayaka
    Kano, Yoshinobu
    NEW FRONTIERS IN ARTIFICIAL INTELLIGENCE, JSAI-ISAI 2022 WORKSHOP, JURISIN 2022, JSAI 2022, 2023, 13859 : 138 - 153