Evaluation of linear, nonlinear and ensemble machine learning models for landslide susceptibility assessment in southwest China

被引:16
作者
Wang, Bingwei [1 ]
Lin, Qigen [1 ]
Jiang, Tong [1 ]
Yin, Huaxiang [1 ]
Zhou, Jian [1 ]
Sun, Jinhao [1 ]
Wang, Dongfang [1 ]
Dai, Ran [1 ]
机构
[1] Nanjing Univ Informat Sci & Technol, Inst Disaster Risk Management, Collaborat Innovat Ctr Forecast & Evaluat Meteoro, Sch Geog Sci, Nanjing, Peoples R China
关键词
Evaluation of machine learning models; cross-validation; landslide susceptibility assessment; southwest China; SYSTEM;
D O I
10.1080/10106049.2022.2152493
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Machine learning models are gradually replacing traditional techniques used for landslide susceptibility assessment. This study aims to comprehensively compare multiple models, including linear, nonlinear, and ensemble models, based on 5281 historical landslides in southwest China, the area most severely affected by the landslide disaster. Linear models represented by logistic regression (LR), nonlinear models represented by support vector machine (SVM), artificial neural network (ANN) and classification 5.0 decision tree (C5.0 DT), and ensemble models represented by random forest (RF) and categorical boosting (Catboost) were selected. The correlation coefficient, variance inflation factor (VIF), and relative important analysis were used to select the dominate landslide conditioning factors. Using multiple statistical indicators (e.g. Area Under the Receiver Operating Characteristic curve (AUC) and Kappa), cross-validation and qualitative methods to evaluate the models' performance. The findings are: (1) Regarding the model predictive performance, the best predictive performance was demonstrated by the ensemble models Catboost (AUC = 0.823 and Kappa = 0.593) and RF (AUC = 0.821 and Kappa = 0.582), followed by the nonlinear models SVM (AUC = 0.775 and Kappa = 0.520), ANN (AUC = 0.770 and Kappa = 0.486) and C5.0 DT (AUC = 0.751 and Kappa = 0.497), while the linear model LR (AUC = 0.756 and Kappa = 0.456) had a more limited performance. The ensemble model, which uses a tree as its baseline classifier, has a lot of potential for studies into the landslide susceptibility. (2) Regarding the model robustness, the three types of models in nonspatial cross-validation (CV) performed relatively similarly in terms of predictive power, while in spatial cross-validation (SPCV), the linear model LR (median AUC = 0.714) achieved better results than the ensemble and nonlinear models. It implies that when the distribution of landslides is not homogeneous, linear models may be the most robust. It is advisable to consider various evaluation metrics from different perspectives and integrate them with specialist qualitative geomorphological empirical knowledge to determine the best model. (3) The Gini index-based RF model suggests that road density was the dominant factor in the frequency of landslides in the study area.
引用
收藏
页数:29
相关论文
共 73 条
[1]   Enhanced classification and regression tree (CART) by genetic algorithm (GA) and grid search (GS) for flood susceptibility mapping and assessment [J].
Ahmadlou, Mohammad ;
Ebrahimian Ghajari, Yasser ;
Karimi, Mohammad .
GEOCARTO INTERNATIONAL, 2022, 37 (26) :13638-13657
[2]   Comparing classical statistic and machine learning models in landslide susceptibility mapping in Ardanuc (Artvin), Turkey [J].
Akinci, Halil ;
Zeybek, Mustafa .
NATURAL HAZARDS, 2021, 108 (02) :1515-1543
[3]   Landslide susceptibility mapping using precipitation data, Mazandaran Province, north of Iran [J].
Amiri, Mohammad Arab ;
Conoscenti, Christian .
NATURAL HAZARDS, 2017, 89 (01) :255-273
[4]   Decision tree based ensemble machine learning approaches for landslide susceptibility mapping [J].
Arabameri, Alireza ;
Chandra Pal, Subodh ;
Rezaie, Fatemeh ;
Chakrabortty, Rabin ;
Saha, Asish ;
Blaschke, Thomas ;
Di Napoli, Mariano ;
Ghorbanzadeh, Omid ;
Thi Ngo, Phuong Thao .
GEOCARTO INTERNATIONAL, 2022, 37 (16) :4594-4627
[5]   Comparative assessment using boosted regression trees, binary logistic regression, frequency ratio and numerical risk factor for gully erosion susceptibility modelling [J].
Arabarneri, Alireza ;
Pradhan, Biswajeet ;
Lombardo, Luigi .
CATENA, 2019, 183
[6]   Comparison of multiple conventional and unconventional machine learning models for landslide susceptibility mapping of Northern part of Pakistan [J].
Aslam, Bilal ;
Zafar, Adeel ;
Khalil, Umer .
ENVIRONMENT DEVELOPMENT AND SUSTAINABILITY, 2022,
[7]   Effectiveness of Training Sample and Features for Random Forest on Road Extraction from Unmanned Aerial Vehicle-Based Point Cloud [J].
Bicici, Serkan ;
Zeybek, Mustafa .
TRANSPORTATION RESEARCH RECORD, 2021, 2675 (12) :401-418
[8]  
Breiman L., 2001, MACH LEARN, V45, P5, DOI DOI 10.1023/A:1010933404324
[9]   SPATIAL CROSS-VALIDATION AND BOOTSTRAP FOR THE ASSESSMENT OF PREDICTION RULES IN REMOTE SENSING: THE R PACKAGE SPERROREST [J].
Brenning, Alexander .
2012 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2012, :5372-5375
[10]  
Centre for Research on the Epidemiology of Disasters-CRED, 2022, EM-DAT The international Disaster Database