Estimating the grade of storm surge disaster loss in coastal areas of China via machine learning algorithms

被引:19
作者
Zhang, Suming [1 ]
Zhang, Jie [1 ,2 ]
Li, Xiaomin [2 ]
Du, Xuexue [1 ]
Zhao, Tangqi [1 ]
Hou, Qi [1 ]
Jin, Xifang [3 ]
机构
[1] China Univ Petr East China, Coll Oceanog & Space Informat, Qingdao 266580, Peoples R China
[2] Minist Nat Resources China, Inst Oceanog 1, Qingdao 266061, Peoples R China
[3] State Ocean Adm, North Sea Marine Forecast Ctr, Qingdao 266001, Peoples R China
基金
国家重点研发计划;
关键词
Storm surge disaster loss; Machine learning algorithms; Indicator screening; Model interpretability; RANDOM FOREST; SPATIAL PREDICTION; RISK-ASSESSMENT; VULNERABILITY; REGRESSION; MODEL; CLASSIFICATION; SEA;
D O I
10.1016/j.ecolind.2022.108533
中图分类号
X176 [生物多样性保护];
学科分类号
090705 ;
摘要
Storm surge is the most severe marine disaster in China, affecting the whole coastal area. Estimating storm surge disaster loss (SSDL) is significant to disaster prevention, sustainability and decision-making. Taking 11 provincial administrative regions in the coastal areas of China as the study area, this paper estimated SSDL grades based on four machine learning (ML) algorithms. A total of 132 pieces of official open-source data of storm surge disasters were collected and divided into a cross-validation set (CV set) and a test set. First, a comprehensive indicator system was constructed from three perspectives, covering the hazard (16) of disaster-causing factors, the vulnerability (22) and resilience (12) of disaster-bearing bodies, including 50 indicators. A few data preprocessing methods are implemented to improve the model performance such as normalization, SMOTE, etc. Then, Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Logistic model tree (LMT), and K-star were applied to construct the estimation model of SSDL grades. Principal component analysis (PCA) and recursive feature elimination (RFE) are adopted for an intelligent screening of the indicators. Finally, the models' performance is compared through Precision, Recall, F1 score and Kappa metrics. The results show that scientific and efficient data preparation is a strong guarantee for the reliability and stability of the models. RFE is verified more suitable for indicator selection in this paper compared with PCA. The importance ranking of RFE enhances the interpretability of the ML model, which shows that the hazard indicator is the most important, the vulnerability indicator is the second, and the resilience indicator is the least. The 27-indicator K-star model, with advantages of accurate estimation, strong generalization, and less workload, is the optimal SSDL estimation model. The number of input indicators of the optimal SSDL estimation model is 27, its CV Precision, Recall, F1 score, and Kappa are 0.838, 0.832, 0.827, and 0.776, and its Precision, Recall, F1 score, and Kappa for test set are 0.819, 0.786, 0.781, and 0.714, respectively. This paper provides a scientific basis for the government's decision-making and risk management, and it can be used as a typical demonstration case of SSDL research.
引用
收藏
页数:19
相关论文
共 72 条
[1]   Principal component analysis [J].
Abdi, Herve ;
Williams, Lynne J. .
WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2010, 2 (04) :433-459
[2]   Social-ecological resilience to coastal disasters [J].
Adger, WN ;
Hughes, TP ;
Folke, C ;
Carpenter, SR ;
Rockström, J .
SCIENCE, 2005, 309 (5737) :1036-1039
[3]   GIS-based comparative assessment of flood susceptibility mapping using hybrid multi-criteria decision-making approach, naive Bayes tree, bivariate statistics and logistic regression: A case of Topla basin, Slovakia [J].
Ali, Sk Ajim ;
Parvin, Farhana ;
Quoc Bao Pham ;
Vojtek, Matej ;
Vojtekova, Jana ;
Costache, Romulus ;
Nguyen Thi Thuy Linh ;
Hong Quan Nguyen ;
Ahmad, Ateeque ;
Ghorbani, Mohammad Ali .
ECOLOGICAL INDICATORS, 2020, 117
[4]   Empirical characterization of random forest variable importance measures [J].
Archer, Kelfie J. ;
Kirnes, Ryan V. .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2008, 52 (04) :2249-2260
[5]   Dynamic flood modeling essential to assess the coastal impacts of climate change [J].
Barnard, Patrick L. ;
Erikson, Li H. ;
Foxgrover, Amy C. ;
Hart, Juliette A. Finzi ;
Limber, Patrick ;
O'Neill, Andrea C. ;
van Ormondt, Maarten ;
Vitousek, Sean ;
Wood, Nathan ;
Hayden, Maya K. ;
Jones, Jeanne M. .
SCIENTIFIC REPORTS, 2019, 9 (1)
[6]   Higher probability of compound flooding from precipitation and storm surge in Europe under anthropogenic climate change [J].
Bevacqua, E. ;
Maraun, D. ;
Vousdoukas, M. I. ;
Voukouvalas, E. ;
Vrac, M. ;
Mentaschi, L. ;
Widmann, M. .
SCIENCE ADVANCES, 2019, 5 (09)
[7]   Multivariate statistical modelling of compound events via pair-copula constructions: analysis of floods in Ravenna (Italy) [J].
Bevacqua, Emanuele ;
Maraun, Douglas ;
Haff, Ingrid Hobaek ;
Widmann, Martin ;
Vrac, Mathieu .
HYDROLOGY AND EARTH SYSTEM SCIENCES, 2017, 21 (06) :2701-2723
[8]   Spatial prediction of landslides using a hybrid machine learning approach based on Random Subspace and Classification and Regression Trees [J].
Binh Thai Pham ;
Prakash, Indra ;
Dieu Tien Bui .
GEOMORPHOLOGY, 2018, 303 :256-270
[9]   Applying principal component analysis (PCA) to the selection of forensic analysis methodologies [J].
Booker, Nigel K. ;
Knights, Peter ;
Gates, J. D. ;
Clegg, Richard E. .
ENGINEERING FAILURE ANALYSIS, 2022, 132
[10]   Principal component analysis [J].
Bro, Rasmus ;
Smilde, Age K. .
ANALYTICAL METHODS, 2014, 6 (09) :2812-2831