FF-BERT: A BERT-based ensemble for automated classification of web-based text on flash flood events

被引:11
|
作者
Wilkho, Rohan Singh [1 ]
Chang, Shi [2 ]
Gharaibeh, Nasir G. [1 ]
机构
[1] Texas A&M Univ, Zachry Dept Civil & Environm Engn, College Stn, TX 77840 USA
[2] Trimble Inc, Westminster, CO 80021 USA
关键词
Flash flood; Text classification; Multi-label text classification; BERT;
D O I
10.1016/j.aei.2023.102293
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The web is a rich information repository that can be mined to uncover additional data about past flash flood (FF) events, currently missing from existing structured databases. However, this information originates from multiple sources (news articles, government records, and weather records among others) and may cover several topics. Furthermore, these topics may be disproportionately covered on the web. The large size and heterogenous nature of web information render manual review difficult. To address this challenge, we have developed a multi-label text classification model, FF-BERT. FF-BERT is designed to classify FF-related web paragraphs into one or more of seven categories: (1) Damage and Economic Impact (DI), (2) Fatalities, Injuries, and Rescue (FIR), (3) Hydrometeorology (HM), (4) Warning and Emergency (WE), (5) Response and Recovery (RR), (6) Public Health (PH), and (7) Mitigation (MG). To develop FF-BERT, we labeled 21,180 paragraphs from FF-related webpages and performed experiments with multiple model architectures based on the widely used language model Bidirectional Encoder Representation from Transformers (BERT). Our final model outperforms the baseline by 11.83%, as measured by the micro-F1 score. In addition, FF-BERT significantly improves the prediction of minority labels (RR-32.1%, PH-260.4%, and MG-138.6%). We demonstrate using real world examples that FF-BERT can be used to uncover new information about flash flood events. This information can be used to enhance existing databases, such as NOAA's Storm Events Database.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] BERT-based Regression Model for Micro-edit Humor Classification Task
    Chen, Yuancheng
    Hou, Yi
    Ye, Deqiang
    Yu, Yuehang
    2021 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, INFORMATION AND COMMUNICATION ENGINEERING, 2021, 11933
  • [22] BERT-Based Dual-Channel Power Equipment Defect Text Assessment Model
    Zhou, Zhenan
    Zhang, Chuyan
    Liang, Xinyi
    Liu, Huifang
    Diao, Mingguang
    Deng, Yu
    IEEE ACCESS, 2024, 12 : 134020 - 134026
  • [23] BVMHA: Text classification model with variable multihead hybrid attention based on BERT
    Peng, Bo
    Zhang, Tao
    Han, Kundong
    Zhang, Zhe
    Ma, Yuquan
    Ma, Mengnan
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2024, 46 (01) : 1443 - 1454
  • [24] Improving the Accuracy and Effectiveness of Text Classification Based on the Integration of the Bert Model and a Recurrent Neural Network (RNN_Bert_Based)
    Eang, Chanthol
    Lee, Seungjae
    APPLIED SCIENCES-BASEL, 2024, 14 (18):
  • [25] BERT-based NLP techniques for classification and severity modeling in basic warranty data study
    Xu, Shuzhe
    Zhang, Chuanlong
    Hong, Don
    INSURANCE MATHEMATICS & ECONOMICS, 2022, 107 : 57 - 67
  • [26] Advancements in Text Subjectivity Analysis: From Simple Approaches to BERT-Based Models and Generalization Assessments
    Antal, Margit
    Buza, Krisztian
    Nemes, Szilard
    ADVANCES IN COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2024, PART I, 2024, 2165 : 245 - 255
  • [27] A Multiscale Interactive Attention Short Text Classification Model Based on BERT
    Zhou, Lu
    Wang, Peng
    Zhang, Huijun
    Wu, Shengbo
    Zhang, Tao
    IEEE ACCESS, 2024, 12 : 160992 - 161001
  • [28] An Efficient Long Chinese Text Sentiment Analysis Method Using BERT-Based Models with BiGRU
    Sheng, Deming
    Yuan, Jingling
    PROCEEDINGS OF THE 2021 IEEE 24TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN (CSCWD), 2021, : 192 - 197
  • [29] A UNIVERSAL BERT-BASED FRONT-END MODEL FOR MANDARIN TEXT-TO-SPEECH SYNTHESIS
    Bai, Zilong
    Hu, Beibei
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6074 - 6078
  • [30] Multi-label text classification of cardiovascular drug attributes based on BERT and BiGRU
    Cui H.
    Zhang L.
    Zhu X.
    Guo X.
    Peng Y.
    Journal of Intelligent and Fuzzy Systems, 2024, 46 (04) : 10683 - 10693