FF-IR: An information retrieval system for flash flood events developed by integrating public-domain data and machine learning

被引:6
作者
Wilkho, Rohan Singh [1 ]
Gharaibeh, Nasir G. [1 ]
Chang, Shi [1 ]
Zou, Lei [2 ]
机构
[1] Texas A&M Univ, Zachry Dept Civil & Environm Engn, College Stn, TX 77840 USA
[2] Texas A&M Univ, Dept Geog, College Stn, TX 77840 USA
基金
美国国家科学基金会;
关键词
Flash flood; Information retrieval; Storm events data; Machine learning; BIG DATA; CLASSIFICATION; SMOTE;
D O I
10.1016/j.envsoft.2023.105734
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Structured databases on flash flood (FF) events have limited information and lack emerging data (e.g., visual media). The web is rich with information that can bridge this gap. However, search engines return long lists of webpages cluttered with commercial and irrelevant information. To address this challenge, we developed a FF information retrieval (IR) system (FF-IR). The system uses machine learning (ML) models in novel ways to automate and enhance this IR process. FF-IR consists of three steps: (1) creates event-specific search queries from the publicly available Storm Events dataset and directs them to Google to collect candidate webpages; (2) transforms the candidate webpages to relevance features; and (3) classifies each candidate webpage as relevant or non-relevant using our ML models. FF-IR outperforms direct Google searches by over 100%, measured by the F2-score. Natural hazard researchers and practitioners can use FF-IR to facilitate FF risk assessments and mitigation planning.
引用
收藏
页数:12
相关论文
共 72 条
[1]  
a Batista G. E. a P., 2004, P 2003 WORKSH OP SOU, V3, P15
[2]   Detection of flood disaster system based on IoT, big data and convolutional deep neural network [J].
Anbarasan, M. ;
Muthu, BalaAnand ;
Sivaparthipan, C. B. ;
Sundarasekar, Revathi ;
Kadry, Seifedine ;
Krishnamoorthy, Sujatha ;
Samuel, Dinesh Jackson R. ;
Dasel, A. Antony .
COMPUTER COMMUNICATIONS, 2020, 150 :150-157
[3]  
[Anonymous], 2013, P WORKSHOP ICLR 2013
[4]  
[Anonymous], 2002, CLASSIFICATION CLUST
[5]   Flood fatalities in the United States [J].
Ashley, Sharon T. ;
Ashley, Walker S. .
JOURNAL OF APPLIED METEOROLOGY AND CLIMATOLOGY, 2008, 47 (03) :805-818
[6]   Development of a national-scale real-time Twitter data mining pipeline for social geodata on the potential impacts of flooding on communities [J].
Barker, J. L. P. ;
Macleod, C. J. A. .
ENVIRONMENTAL MODELLING & SOFTWARE, 2019, 115 :213-227
[7]  
Batista GEAPA., 2004, ACM SIGKDD Explor. Newsl, V6, P20, DOI [DOI 10.1145/1007730.1007735, 10.1145/1007730.1007735]
[8]  
Breiman L., 2017, Classification and regression trees
[9]   The anatomy of a large-scale hypertextual Web search engine [J].
Brin, S ;
Page, L .
COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (1-7) :107-117
[10]  
Cer D, 2018, CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, P169