An empirical assessment of smote variants techniques and interpretation methods in improving the accuracy and the interpretability of student performance models

被引:12
作者
Sahlaoui, Hayat [1 ]
Alaoui, El Arbi Abdellaoui [2 ]
Agoujil, Said [3 ]
Nayyar, Anand [4 ]
机构
[1] Moulay Ismail Univ Meknes, Fac Sci & Tech Errachidia, Dept Comp Sci, Route Meknes, Errachidia, Morocco
[2] Moulay Ismail Univ Meknes, Ecole Normale Super, Dept Sci, IEVIA Team,IMAGE Lab, Meknes, Morocco
[3] Moulay Ismail Univ Meknes, Ecole Natl Commerce & Gest, El Hajeb, Morocco
[4] Duy Tan Univ, Fac Informat Technol, Grad Sch, Da Nang 550000, Vietnam
关键词
Machine learning; Classification; Imbalanced data issue; Smote variants; Friedman test; Shap; Lime; Student performance; INTENSIVE-CARE UNITS; MORTALITY PREDICTION; CLASSIFICATION; RISK; ICU;
D O I
10.1007/s10639-023-12007-w
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Predicting student performance using educational data is a significant area of machine learning research. However, class imbalance in datasets and the challenge of developing interpretable models can hinder accuracy. This study compares different variations of the Synthetic Minority Oversampling Technique (SMOTE) combined with classification algorithms to create prediction models. The results show that SMOTE with Edited Nearest Neighbors is superior, and the balanced random forest classifier performs better when using SMOTE-ENN, achieving 96% accuracy, precision, and F-value. Smote also has faster execution time. For model interpretability, combining Local Interpretable Model-agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP) provides deeper insights. LIME is suitable for single-prediction interpretation, while SHAP is better for overall model interpretation. This research offers guidelines to mitigate data imbalance and improve fairness in education through data-driven innovations like early warning systems. It also educates academics on explainability approaches to facilitate wider use of machine learning methods.
引用
收藏
页码:5447 / 5483
页数:37
相关论文
共 72 条
[1]   Developing an early-warning system for spotting at-risk students by using eBook interaction logs [J].
Akcapinar, Gokhan ;
Hasnine, Mohammad Nehal ;
Majumdar, Rwitajit ;
Flanagan, Brendan ;
Ogata, Hiroaki .
SMART LEARNING ENVIRONMENTS, 2019, 6 (01)
[2]  
Ali A., 2013, Int. J. Adv. Soft Comput. Appl., V5, P176
[3]   Early hospital mortality prediction of intensive care unit patients using an ensemble learning approach [J].
Awad, Aya ;
Bader-El-Den, Mohamed ;
McNicholas, James ;
Briggs, Jim .
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2017, 108 :185-195
[4]  
Barandela R, 2004, LECT NOTES COMPUT SC, V3138, P806
[5]   Predictive Models for Imbalanced Data: A School Dropout Perspective [J].
Barros, Thiago M. ;
Souza Neto, Placido A. ;
Silva, Ivanovitch ;
Guedes, Luiz Affonso .
EDUCATION SCIENCES, 2019, 9 (04)
[6]  
Batista GEAPA., 2004, ACM SIGKDD EXPL NEWS, V6, P20, DOI DOI 10.1145/1007730.1007735
[7]  
Belachew E. B., 2017, Int. J. Adv. Res. Comput. Sci. Softw. Eng., V7, P46
[8]   Application of the logistic function to bio-assay [J].
Berkson, J .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1944, 39 (227) :357-365
[9]  
Brownlee J., 2018, A gentle introduction to normality tests in python
[10]   Application of Machine Learning in Predicting Performance for Computer Engineering Students: A Case Study [J].
Buenano-Fernandez, Diego ;
Gil, David ;
Lujan-Mora, Sergio .
SUSTAINABILITY, 2019, 11 (10)