Addressing class imbalance in soil movement predictions

被引:0
作者
Kumar, Praveen [1 ]
Priyanka, Priyanka [1 ]
Uday, Kala Venkata [2 ]
Dutt, Varun [1 ]
机构
[1] Indian Inst Technol Mandi, Appl Cognit Sci Lab, Kamand 175075, Himachal Prades, India
[2] Indian Inst Technol Mandi, Geotech Lab, Kamand 175075, Himachal Prades, India
关键词
SMOTE;
D O I
10.5194/nhess-24-1913-2024
中图分类号
P [天文学、地球科学];
学科分类号
07 ;
摘要
Landslides threaten human life and infrastructure, resulting in fatalities and economic losses. Monitoring stations provide valuable data for predicting soil movement, which is crucial in mitigating this threat. Accurately predicting soil movement from monitoring data is challenging due to its complexity and inherent class imbalance. This study proposes developing machine learning (ML) models with oversampling techniques to address the class imbalance issue and develop a robust soil movement prediction system. The dataset, comprising 2 years (2019-2021) of monitoring data from a landslide in Uttarakhand, has a 70 : 30 ratio of training and testing data. To tackle the class imbalance problem, various oversampling techniques, including the synthetic minority oversampling technique (SMOTE), K -means SMOTE, borderline-SMOTE, and adaptive SMOTE (ADASYN), were applied to the training dataset. Several ML models, namely random forest (RF), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), adaptive boosting (AdaBoost), category boosting (CatBoost), long short-term memory (LSTM), multilayer perceptron (MLP), and a dynamic ensemble, were trained and compared for soil movement prediction. A 5-fold cross-validation method was applied to optimize the ML models on the training data, and the models were tested on the testing set. Among these ML models, the dynamic ensemble model with K -means SMOTE performed the best in testing, with an accuracy, precision, and recall rate of 0.995, 0.995, and 0.995, respectively, and an F1 score of 0.995. Additionally, models without oversampling exhibited poor performance in training and testing, highlighting the importance of incorporating oversampling techniques to enhance predictive capabilities.
引用
收藏
页码:1913 / 1928
页数:16
相关论文
共 27 条
[1]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[2]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[3]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[4]   Regionalization of rainfall thresholds: an aid to landslide hazard evaluation [J].
Crosta, G .
ENVIRONMENTAL GEOLOGY, 1998, 35 (2-3) :131-145
[5]   Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE [J].
Douzas, Georgios ;
Bacao, Fernando ;
Last, Felix .
INFORMATION SCIENCES, 2018, 465 :1-20
[6]   Landslide Hazard in the Nainital township, Kumaun Himalaya, India: the case of September 2014 Balia Nala landslide [J].
Gupta, Vikram ;
Bhasin, Rajinder K. ;
Kaynia, Amir M. ;
Tandon, Ruchika Sharma ;
Venkateshwarlu, B. .
NATURAL HAZARDS, 2016, 80 (02) :863-877
[7]   Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning [J].
Han, H ;
Wang, WY ;
Mao, BH .
ADVANCES IN INTELLIGENT COMPUTING, PT 1, PROCEEDINGS, 2005, 3644 :878-887
[8]   ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning [J].
He, Haibo ;
Bai, Yang ;
Garcia, Edwardo A. ;
Li, Shutao .
2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, :1322-1328
[9]  
Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.8.1735, 10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
[10]  
Ke GL, 2017, ADV NEUR IN, V30