A random forest model of landslide susceptibility mapping based on hyperparameter optimization using Bayes algorithm

被引:329
作者
Sun, Deliang [1 ]
Wen, Haijia [2 ,3 ,4 ]
Wang, Danzhou [1 ]
Xu, Jiahui [1 ]
机构
[1] Chongqing Normal Univ, Key Lab GIS Applicat Res, Chongqing 401331, Peoples R China
[2] Minist Educ, Key Lab New Technol Construct Cities Mt Area, Chongqing 400045, Peoples R China
[3] Natl Joint Engn Res Ctr Geohazards Prevent Reserv, Chongqing 400044, Peoples R China
[4] Chongqing Univ, Sch Civil Engn, Chongqing 400045, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Bayes algorithm; Random forest; Landslide susceptibility mapping; Hyperparameter optimization; Factor screening; LOGISTIC-REGRESSION; SPATIAL PREDICTION; HYPER-PARAMETERS; CLASSIFICATION; BIVARIATE; COUNTY; TREE;
D O I
10.1016/j.geomorph.2020.107201
中图分类号
P9 [自然地理学];
学科分类号
0705 ; 070501 ;
摘要
The choice of model parameters in landslide susceptibility mapping makes a major determinant of model accuracy. The purpose of this study is to optimize the hyperparameters based on a Bayesian optimization algorithm, and to obtain a high accuracy random forest landslide susceptibility evaluation model. The research steps are detailed as follows. Firstly, taking a typical landslide prone mountainous area as an example, 16 conditioning factors, such as elevation, annual average rainfall, distance from roads, distance from buildings and so on, were preliminarily selected as the conditioning factors of landslide susceptibility. Combined with 1520 historical landslide events, a geospatial database was established with 30 m resolution. Secondly, the geospatial data sample set was constructed by random sampling according to ratio of historical landslides and non-landslides of 1:10. Based on the whole sample set, the random forest model adopted the Bayesian optimization algorithm to optimize the hyperparameters. Next, the optimal hyperparameters were selected to be trained to get the evaluation model of landslide susceptibility. In addition, they were carried out the analysis of landslide susceptibility mapping for the whole study area. After that, the recursive feature elimination method was used to screen out the dominant conditioning factors that can explain the degree of landslide susceptibility. The results indicated that the area under curve (AUC) values of receiver operating characteristic (ROC) curve in training data set, verification data set and regional simulation were 0.95. 0.87 and 0.93, respectively. 65% of the historical landslides fell between the high susceptibility and very high susceptibility regions, which made up <20% of the research area. The model was in good agreement to the distribution characteristics of historical landslides in the study area. We noted that all the three recent landslides with impact on the study area occurred at the locations predicted by the model to have high or very high susceptibility in terms of typical landslides in the near future. As for conditioning factors, the contribution related to human activities accounted for a large proportion. In conclusion, an evaluation model with high precision for random forest landslide susceptibility can be built based on hyperparameter optimization with Bayesian optimization algorithm. Simultaneously, using recursive feature elimination method, a random forest landslide susceptibility model with fewer dominant conditioning factors and guaranteed evaluation accuracy can also be built to save the running time and input data resources of the model. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:14
相关论文
共 39 条
[11]   Landslide susceptibility mapping along road corridors in the Indian Himalayas using Bayesian logistic regression models [J].
Das, Iswar ;
Stein, Alfred ;
Kerle, Norman ;
Dadhwal, Vinay K. .
GEOMORPHOLOGY, 2012, 179 :116-125
[12]   Landslide susceptibility mapping using an integrated model of information value method and logistic regression in the Bailongjiang watershed, Gansu Province, China [J].
Du Guo-liang ;
Zhang Yong-shuang ;
Iqbal, Javed ;
Yang Zhi-hua ;
Yao Xin .
JOURNAL OF MOUNTAIN SCIENCE, 2017, 14 (02) :249-268
[13]   Global fatal landslide occurrence from 2004 to 2016 [J].
Froude, Melanie J. ;
Petley, David N. .
NATURAL HAZARDS AND EARTH SYSTEM SCIENCES, 2018, 18 (08) :2161-2181
[14]   Dealing with categorical and integer-valued variables in Bayesian Optimization with Gaussian processes [J].
Garrido-Merchan, Eduardo C. ;
Hernandez-Lobato, Daniel .
NEUROCOMPUTING, 2020, 380 :20-35
[15]  
[郭子正 Guo Zizheng], 2019, [岩石力学与工程学报, Chinese Journal of Rock Mechanics and Engineering], V38, P287
[16]   Landslide susceptibility assessment in Lianhua County (China): A comparison between a random forest data mining technique and bivariate and multivariate statistical models [J].
Hong, Haoyuan ;
Pourghasemi, Hamid Reza ;
Pourtaghi, Zohre Sadat .
GEOMORPHOLOGY, 2016, 259 :105-118
[17]   Selecting Hyper-Parameters of Gaussian Process Regression Based on Non-Inertial Particle Swarm Optimization in Internet of Things [J].
Kang, Lanlan ;
Chen, Ruey-Shun ;
Xiong, Naixue ;
Chen, Yeh-Cheng ;
Hu, Yu-Xi ;
Chen, Chien-Ming .
IEEE ACCESS, 2019, 7 :59504-59513
[18]  
[李亭 Li Ting], 2014, [地理与地理信息科学, Geography and Geo-information Science], V30, P25
[19]   A graded proportion method of training sample selection for updating conventional soil maps [J].
Liu, Xueqi ;
Zhu, A-Xing ;
Yang, Lin ;
Pei, Tao ;
Liu, Junzhi ;
Zeng, Canying ;
Wang, Desheng .
GEODERMA, 2020, 357
[20]  
Ni ShuBin Ni ShuBin, 2018, Journal of Beijing Forestry University, V40, P81