Machine learning with a susceptibility index-based sampling strategy for landslide susceptibility assessment

被引：8

作者：

Liu, Lei-Lei ^{[1
]}

Zhang, Yi-Li ^{[1
]}

Zhang, Shao-He ^{[1
]}

Shu, Biao ^{[1
]}

Xiao, Ting ^{[1
]}

机构：

[1] Cent South Univ, Sch Geosci & Infophys, Minist Educ, Key Lab Metallogen Predict Nonferrous Met & Geol, Changsha, Peoples R China

来源：

GEOCARTO INTERNATIONAL | 2022年 / 37卷 / 27期

基金：

中国国家自然科学基金;

关键词：

Landslide susceptibility; susceptibility index; machine learning; sampling strategy; SUPPORT VECTOR MACHINE; LOGISTIC-REGRESSION; RANDOM FOREST; SPATIAL PREDICTION; FREQUENCY RATIO; DECISION TREE; ABSENCE DATA; MODELS; CLASSIFICATION; SELECTION;

D O I：

10.1080/10106049.2022.2102221

中图分类号：

X [环境科学、安全科学];

学科分类号：

08 ; 0830 ;

摘要：

The past landslides from landslide inventory are essential for machine learning (ML)-based landslide susceptibility assessment (LSA). They determine not only the positive samples (characterized by the past landslides) but also the negative samples (randomly generated based on past landslides) for training and validating the ML models. However, the number of past landslides is often limited because of the constraints of time, budget, and resources available, etc. In other words, the available data for establishing landslide susceptibility ML models are limited, which indicates that the accuracy and reliability of the corresponding models are insufficient. This article, therefore, proposes using a new landslide susceptibility index-based sampling strategy to enhance the positive and negative samples for model training and validation to reach an improved ML-based LSA. To realize this idea, landslide susceptibility analysis based on initial datasets compiled from landslide inventory by using three ML models, i.e., random forest (RF), gradient boosting decision tree (GBDT) and support vector machine (SVM), are first conducted to obtain the initial landslide susceptibility indices at different space locations. Then, the landslide susceptibility indices are analyzed with the proposed sampling strategy which considers directly the grid units with very high and very low landslide susceptibility indices as potential positive and negative samples, respectively, to enrich the initial dataset; and the number of these positive/negative samples is determined by a Bayesian optimization algorithm. Thereafter, the ML models are updated with the enriched datasets. Finally, to verify the effectiveness of the proposed strategy, the improved models are applied to assess the landslide susceptibility of Taojiang County, China, and the results are compared with those from initial corresponding models without updating. The results show that compared with the initial RF, GBDT and SVM models, the corresponding improved models have a better performance in accuracy, precision, recall rate, and specificity. In particular, the AUC values of the three models are increased by 11.94%, 10.72% and 0.57%, respectively.

引用

页码：15683 / 15713

页数：31

共 91 条

[1] Comparison of GIS-based landslide susceptibility models using frequency ratio, logistic regression, and artificial neural network in a tertiary region of Ambon, Indonesia [J].