Flood Susceptibility Assessment with Random Sampling Strategy in Ensemble Learning (RF and XGBoost)

被引:22
作者
Ren, Hancheng [1 ,2 ]
Pang, Bo [1 ,2 ]
Bai, Ping [3 ]
Zhao, Gang [4 ]
Liu, Shu [5 ,6 ]
Liu, Yuanyuan [5 ,6 ]
Li, Min [5 ,6 ]
机构
[1] Beijing Normal Univ, Coll Water Sci, Beijing 100875, Peoples R China
[2] Beijing Key Lab Urban Hydrol Cycle & Sponge City T, Beijing 100875, Peoples R China
[3] Kunming Flood Control & Drought Relief Headquarter, Kunming 650000, Peoples R China
[4] Univ Tokyo, Inst Ind Sci, Tokyo 1538505, Japan
[5] China Inst Water Resources & Hydropower Res, Beijing 100038, Peoples R China
[6] Minist Water Resources, Res Ctr Flood & Drought Disaster Reduct, Beijing 100038, Peoples R China
关键词
flood susceptibility; ensemble learning; random sampling strategies; mountainous urban areas; Artificial Neural Network (ANN); Support Vector Machine (SVM); IMPACT; AREAS; MODELS; INDEX;
D O I
10.3390/rs16020320
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Due to the complex interaction of urban and mountainous floods, assessing flood susceptibility in mountainous urban areas presents a challenging task in environmental research and risk analysis. Data-driven machine learning methods can evaluate flood susceptibility in mountainous urban areas lacking essential hydrological data, utilizing remote sensing data and limited historical inundation records. In this study, two ensemble learning algorithms, Random Forest (RF) and XGBoost, were adopted to assess the flood susceptibility of Kunming, a typical mountainous urban area prone to severe flood disasters. A flood inventory was created using flood observations from 2018 to 2022. The spatial database included 10 explanatory factors, encompassing climatic, geomorphic, and anthropogenic factors. Artificial Neural Network (ANN) and Support Vector Machine (SVM) were selected for model comparison. To minimize the influence of expert opinions on model training, this study employed a strategy of uniformly random sampling in historically non-flooded areas for negative sample selection. The results demonstrated that (1) ensemble learning algorithms offer higher accuracy than other machine learning methods, with RF achieving the highest accuracy, evidenced by an area under the curve (AUC) of 0.87, followed by XGBoost at 0.84, surpassing both ANN (0.83) and SVM (0.82); (2) the interpretability of ensemble learning highlighted the differences in the potential distribution of the training data's positive and negative samples. Feature importance in ensemble learning can be utilized to minimize human bias in the collection of flooded-site samples, more targeted flood susceptibility maps of the study area's road network were obtained; and (3) ensemble learning algorithms exhibited greater stability and robustness in datasets with varied negative samples, as evidenced by their performance in F1-Score, Kappa, and AUC metrics. This paper further substantiates the superiority of ensemble learning in flood susceptibility assessment tasks from the perspectives of accuracy, interpretability, and robustness, enhances the understanding of the impact of negative samples on such assessments, and optimizes the specific process for urban flood susceptibility assessment using data-driven methods.
引用
收藏
页数:18
相关论文
共 66 条
  • [1] Flash-flood susceptibility mapping based on XGBoost, random forest and boosted regression trees
    Abedi, Rahebeh
    Costache, Romulus
    Shafizadeh-Moghadam, Hossein
    Pham, Quoc Bao
    [J]. GEOCARTO INTERNATIONAL, 2022, 37 (19) : 5479 - 5496
  • [2] Mapping flood susceptibility in an arid region of southern Iraq using ensemble machine learning classifiers: a comparative study
    Al-Abadi, Alaa M.
    [J]. ARABIAN JOURNAL OF GEOSCIENCES, 2018, 11 (09)
  • [3] The spatiotemporal dynamics of urbanisation and local climate: A case study of Islamabad, Pakistan
    Aslam, Ayman
    Rana, Irfan Ahmad
    Bhatti, Saad Saleem
    [J]. ENVIRONMENTAL IMPACT ASSESSMENT REVIEW, 2021, 91
  • [4] Belton Valerie., 2002, Multiple Criteria Decision Analysis: An Integrated Approach
  • [5] Beven K.J., 1979, Hydrological Sciences Bulletin, V24, P43, DOI [DOI 10.1080/02626667909491834, 10.1080/02626667909491834]
  • [6] GIS Based Hybrid Computational Approaches for Flash Flood Susceptibility Assessment
    Binh Thai Pham
    Avand, Mohammadtaghi
    Janizadeh, Saeid
    Tran Van Phong
    Al-Ansari, Nadhir
    Lanh Si Ho
    Das, Sumit
    Hiep Van Le
    Amini, Ata
    Bozchaloei, Saeid Khosrobeigi
    Jafari, Faeze
    Prakash, Indra
    [J]. WATER, 2020, 12 (03)
  • [7] At what scales do climate variability and land cover change impact on flooding and low flows?
    Bloeschl, Guenter
    Ardoin-Bardin, Sandra
    Bonell, Mike
    Dorninger, Manfred
    Goodrich, David
    Gutknecht, Dieter
    Matamoros, David
    Merz, Bruno
    Shand, Paul
    Szolgay, Jan
    [J]. HYDROLOGICAL PROCESSES, 2007, 21 (09) : 1241 - 1247
  • [8] Ensembles for feature selection: A review and future trends
    Bolon-Canedo, Veronica
    Alonso-Betanzos, Amparo
    [J]. INFORMATION FUSION, 2019, 52 : 1 - 12
  • [9] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [10] Influence of rain pattern on flood control in mountain creek areas: a case study of northern Zhejiang
    Cao, Feifeng
    Tao, Qiru
    Dong, Shaojun
    Li, Xiaolong
    [J]. APPLIED WATER SCIENCE, 2020, 10 (10)