Machine learning with a susceptibility index-based sampling strategy for landslide susceptibility assessment

被引:8
作者
Liu, Lei-Lei [1 ]
Zhang, Yi-Li [1 ]
Zhang, Shao-He [1 ]
Shu, Biao [1 ]
Xiao, Ting [1 ]
机构
[1] Cent South Univ, Sch Geosci & Infophys, Minist Educ, Key Lab Metallogen Predict Nonferrous Met & Geol, Changsha, Peoples R China
基金
中国国家自然科学基金;
关键词
Landslide susceptibility; susceptibility index; machine learning; sampling strategy; SUPPORT VECTOR MACHINE; LOGISTIC-REGRESSION; RANDOM FOREST; SPATIAL PREDICTION; FREQUENCY RATIO; DECISION TREE; ABSENCE DATA; MODELS; CLASSIFICATION; SELECTION;
D O I
10.1080/10106049.2022.2102221
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
The past landslides from landslide inventory are essential for machine learning (ML)-based landslide susceptibility assessment (LSA). They determine not only the positive samples (characterized by the past landslides) but also the negative samples (randomly generated based on past landslides) for training and validating the ML models. However, the number of past landslides is often limited because of the constraints of time, budget, and resources available, etc. In other words, the available data for establishing landslide susceptibility ML models are limited, which indicates that the accuracy and reliability of the corresponding models are insufficient. This article, therefore, proposes using a new landslide susceptibility index-based sampling strategy to enhance the positive and negative samples for model training and validation to reach an improved ML-based LSA. To realize this idea, landslide susceptibility analysis based on initial datasets compiled from landslide inventory by using three ML models, i.e., random forest (RF), gradient boosting decision tree (GBDT) and support vector machine (SVM), are first conducted to obtain the initial landslide susceptibility indices at different space locations. Then, the landslide susceptibility indices are analyzed with the proposed sampling strategy which considers directly the grid units with very high and very low landslide susceptibility indices as potential positive and negative samples, respectively, to enrich the initial dataset; and the number of these positive/negative samples is determined by a Bayesian optimization algorithm. Thereafter, the ML models are updated with the enriched datasets. Finally, to verify the effectiveness of the proposed strategy, the improved models are applied to assess the landslide susceptibility of Taojiang County, China, and the results are compared with those from initial corresponding models without updating. The results show that compared with the initial RF, GBDT and SVM models, the corresponding improved models have a better performance in accuracy, precision, recall rate, and specificity. In particular, the AUC values of the three models are increased by 11.94%, 10.72% and 0.57%, respectively.
引用
收藏
页码:15683 / 15713
页数:31
相关论文
共 91 条
[1]   Comparison of GIS-based landslide susceptibility models using frequency ratio, logistic regression, and artificial neural network in a tertiary region of Ambon, Indonesia [J].
Aditian, Aril ;
Kubota, Tetsuya ;
Shinohara, Yoshinori .
GEOMORPHOLOGY, 2018, 318 :101-111
[2]   Landslide Susceptibility Evaluation and Management Using Different Machine Learning Methods in The Gallicash River Watershed, Iran [J].
Arabameri, Alireza ;
Saha, Sunil ;
Roy, Jagabandhu ;
Chen, Wei ;
Blaschke, Thomas ;
Dieu Tien Bui .
REMOTE SENSING, 2020, 12 (03)
[3]   The application of GIS-based logistic regression for landslide susceptibility mapping in the Kakuda-Yahiko Mountains, Central Japan [J].
Ayalew, L ;
Yamagishi, H .
GEOMORPHOLOGY, 2005, 65 (1-2) :15-31
[4]  
[鲍帅 Bao Shuai], 2021, [震灾防御技术, Technology for Earthquake Disaster Prevention], V16, P625
[5]   MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning [J].
Barua, Sukarna ;
Islam, Md. Monirul ;
Yao, Xin ;
Murase, Kazuyuki .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) :405-425
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   Hybrid probabilistic sampling with random subspace for imbalanced data learning [J].
Cao, Peng ;
Zhao, Dazhe ;
Zaiane, Osmar .
INTELLIGENT DATA ANALYSIS, 2014, 18 (06) :1089-1108
[8]   Convenient Electrochemical Determination of Sunset Yellow and Tartrazine in Food Samples Using a Poly(L-Phenylalanine)-Modified Glassy Carbon Electrode [J].
Chao, Mingyong ;
Ma, Xinying .
FOOD ANALYTICAL METHODS, 2015, 8 (01) :130-138
[9]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[10]   Assessing the effects of pseudo-absences on predictive distribution model performance [J].
Chefaoui, Rosa M. ;
Lobo, Jorge M. .
ECOLOGICAL MODELLING, 2008, 210 (04) :478-486