FF-BERT: A BERT-based ensemble for automated classification of web-based text on flash flood events

被引：11

作者：

Wilkho, Rohan Singh ^{[1
]}

Chang, Shi ^{[2
]}

Gharaibeh, Nasir G. ^{[1
]}

机构：

[1] Texas A&M Univ, Zachry Dept Civil & Environm Engn, College Stn, TX 77840 USA

[2] Trimble Inc, Westminster, CO 80021 USA

来源：

ADVANCED ENGINEERING INFORMATICS | 2024年 / 59卷

关键词：

Flash flood; Text classification; Multi-label text classification; BERT;

D O I：

10.1016/j.aei.2023.102293

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The web is a rich information repository that can be mined to uncover additional data about past flash flood (FF) events, currently missing from existing structured databases. However, this information originates from multiple sources (news articles, government records, and weather records among others) and may cover several topics. Furthermore, these topics may be disproportionately covered on the web. The large size and heterogenous nature of web information render manual review difficult. To address this challenge, we have developed a multi-label text classification model, FF-BERT. FF-BERT is designed to classify FF-related web paragraphs into one or more of seven categories: (1) Damage and Economic Impact (DI), (2) Fatalities, Injuries, and Rescue (FIR), (3) Hydrometeorology (HM), (4) Warning and Emergency (WE), (5) Response and Recovery (RR), (6) Public Health (PH), and (7) Mitigation (MG). To develop FF-BERT, we labeled 21,180 paragraphs from FF-related webpages and performed experiments with multiple model architectures based on the widely used language model Bidirectional Encoder Representation from Transformers (BERT). Our final model outperforms the baseline by 11.83%, as measured by the micro-F1 score. In addition, FF-BERT significantly improves the prediction of minority labels (RR-32.1%, PH-260.4%, and MG-138.6%). We demonstrate using real world examples that FF-BERT can be used to uncover new information about flash flood events. This information can be used to enhance existing databases, such as NOAA's Storm Events Database.

引用

页数：12

共 50 条

[21] BERT-based Regression Model for Micro-edit Humor Classification Task
Chen, Yuancheng
Hou, Yi
Ye, Deqiang
Yu, Yuehang
2021 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, INFORMATION AND COMMUNICATION ENGINEERING, 2021, 11933
[22] BERT-Based Dual-Channel Power Equipment Defect Text Assessment Model
Zhou, Zhenan
Zhang, Chuyan
Liang, Xinyi
Liu, Huifang
Diao, Mingguang
Deng, Yu
IEEE ACCESS, 2024, 12 : 134020 - 134026
[23] BVMHA: Text classification model with variable multihead hybrid attention based on BERT
Peng, Bo
Zhang, Tao
Han, Kundong
Zhang, Zhe
Ma, Yuquan
Ma, Mengnan
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2024, 46 (01) : 1443 - 1454
[24] Improving the Accuracy and Effectiveness of Text Classification Based on the Integration of the Bert Model and a Recurrent Neural Network (RNN_Bert_Based)
Eang, Chanthol
Lee, Seungjae
APPLIED SCIENCES-BASEL, 2024, 14 (18):
[25] BERT-based NLP techniques for classification and severity modeling in basic warranty data study
Xu, Shuzhe
Zhang, Chuanlong
Hong, Don
INSURANCE MATHEMATICS & ECONOMICS, 2022, 107 : 57 - 67
[26] Advancements in Text Subjectivity Analysis: From Simple Approaches to BERT-Based Models and Generalization Assessments
Antal, Margit
Buza, Krisztian
Nemes, Szilard
ADVANCES IN COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2024, PART I, 2024, 2165 : 245 - 255
[27] A Multiscale Interactive Attention Short Text Classification Model Based on BERT
Zhou, Lu
Wang, Peng
Zhang, Huijun
Wu, Shengbo
Zhang, Tao
IEEE ACCESS, 2024, 12 : 160992 - 161001
[28] An Efficient Long Chinese Text Sentiment Analysis Method Using BERT-Based Models with BiGRU
Sheng, Deming
Yuan, Jingling
PROCEEDINGS OF THE 2021 IEEE 24TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN (CSCWD), 2021, : 192 - 197
[29] A UNIVERSAL BERT-BASED FRONT-END MODEL FOR MANDARIN TEXT-TO-SPEECH SYNTHESIS
Bai, Zilong
Hu, Beibei
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6074 - 6078
[30] Multi-label text classification of cardiovascular drug attributes based on BERT and BiGRU
Cui H.
Zhang L.
Zhu X.
Guo X.
Peng Y.
Journal of Intelligent and Fuzzy Systems, 2024, 46 (04) : 10683 - 10693

← 1 2 3 4 5 →