Optimizing Multi-Class Classification in Educational Data with Ensemble Learning and Data Balancing Techniques

被引:0
作者
Al-Hammouri, Mohammad F. [1 ]
Hammouri, Ziad Akram Ali [2 ]
Almalkawi, Islam T. [1 ]
Lafee, Ansam [1 ]
机构
[1] Hashemite Univ, Dept Comp Engn, Zarqa, Jordan
[2] Middle East Univ, Dept Comp Sci, Amman, Jordan
来源
2024 FIFTH INTERNATIONAL CONFERENCE ON INTELLIGENT DATA SCIENCE TECHNOLOGIES AND APPLICATIONS, IDSTA | 2024年
关键词
multiclass classification; machine learning; data balancing; student dropout; imbalanced datasets;
D O I
10.1109/IDSTA62194.2024.10746987
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The field of education is increasingly embracing AI tools to improve student outcomes. This work aims to reduce academic failure in higher education by employing machine learning techniques to identify at-risk students early in their educational journey, enabling the implementation of supportive strategies to assist them. This study examines a dataset from a higher education institution and utilizes it to develop a classification model for predicting students' academic performance. The problem is formulated as a multi-class classification task with three categories: Graduate, Enrolled, and Dropout, with a significant imbalance skewed toward the Graduate. To improve prediction accuracy toward the minority class, the data balancing technique SMOTE with Edited Nearest Neighbor (SMOTE-ENN) is applied. Three popular classification models-Random Forest, XGBOOST and CatBoost-are employed. The findings show that SMOTE-ENN significantly improves classification results. Moreover, XGBOOST demonstrated the highest accuracy (94.6%) in correctly identifying all classes, as evidenced by the confusion matrix evaluation, achieving the highest results compared to previous work in the literature. Implementing these models allows for accurate predictions of students' performance and helps reduce dropout rates.
引用
收藏
页码:12 / 17
页数:6
相关论文
共 21 条
[1]  
Attiya W.M., 2023, 2023 International Conference on Cyber Management and Engineering (CyMaEn), P171
[2]  
Batista GEAPA, 2004, ACM SIGKDD Explor Newsl, V6, P20, DOI [10.1145/1007730.1007735, DOI 10.1145/1007730.1007735]
[3]   Predictive Model Using a Machine Learning Approach for Enhancing the Retention Rate of Students At-Risk [J].
Brdesee, Hani Sami ;
Alsaggaf, Wafaa ;
Aljohani, Naif ;
Hassan, Saeed-Ul .
INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2022, 18 (01)
[4]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[5]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[6]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[7]   Student Dropout Prediction [J].
Del Bonifro, Francesca ;
Gabbrielli, Maurizio ;
Lisanti, Giuseppe ;
Zingaro, Stefano Pio .
ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2020), PT I, 2020, 12163 :129-140
[8]  
Fernández-Delgado M, 2014, J MACH LEARN RES, V15, P3133
[9]  
Gupta Kumar, 2024, 2024 11th International Conference on Computing for Sustainable Global Development (INDIACom), P709, DOI 10.23919/INDIACom61295.2024.10498546
[10]  
Kiss Botond, 2019, 2019 17th International Conference on Emerging eLearning Technologies and Applications (ICETA). Proceedings, P383, DOI 10.1109/ICETA48886.2019.9040158