Predictive Models for Imbalanced Data: A School Dropout Perspective

被引:33
作者
Barros, Thiago M. [1 ]
Souza Neto, Placido A. [1 ]
Silva, Ivanovitch [2 ]
Guedes, Luiz Affonso [2 ]
机构
[1] Fed Inst Rio Grande Norte IFRN, 1559 Tirol, Natal, RN, Brazil
[2] Fed Univ Rio Grande Norte UFRN, BR-59078970 Natal, RN, Brazil
来源
EDUCATION SCIENCES | 2019年 / 9卷 / 04期
关键词
dropout rates; accuracy paradox; imbalanced learning; downsample; g-mean predict; mlp; decision tree; Balanced Bagging; UAR; SMOTE; ADASYN; STUDENTS PERFORMANCE; NETWORKS; ONLINE;
D O I
10.3390/educsci9040275
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Predicting school dropout rates is an important issue for the smooth execution of an educational system. This problem is solved by classifying students into two classes using educational activities related statistical datasets. One of the classes must identify the students who have the tendency to persist. The other class must identify the students who have the tendency to dropout. This problem often encounters a phenomenon that masks out the obtained results. This study delves into this phenomenon and provides a reliable educational data mining technique that accurately predicts the dropout rates. In particular, the three data classifying techniques, namely, decision tree, neural networks and Balanced Bagging, are used. The performances of these classifies are tested with and without the use of a downsample, SMOTE and ADASYN data balancing. It is found that among other parameters geometric mean and UAR provides reliable results while predicting the dropout rates using Balanced Bagging classifying techniques.
引用
收藏
页数:17
相关论文
共 45 条
[1]  
[Anonymous], 2007, KNOWLEDGEDISCOVERY D, DOI DOI 10.4018/978-1-59904-252-7
[2]  
[Anonymous], 2016, DATA MINING
[3]   Analyzing undergraduate students' performance using educational data mining [J].
Asif, Raheela ;
Merceron, Agathe ;
Ali, Syed Abbas ;
Haider, Najmi Ghani .
COMPUTERS & EDUCATION, 2017, 113 :177-194
[4]  
Barros R.P., 2017, TECHNICAL REPORT
[5]  
Barros T.M., 2019, MODELO IFRN INTEGRAD
[6]   Investigating the relationship between success factors and student participation in online and blended learning in adult education [J].
Blieck, Yves ;
Kauwenberghs, Kurt ;
Zhu, Chang ;
Struyven, Katrien ;
Pynoo, Bram ;
DePryck, Koen .
JOURNAL OF COMPUTER ASSISTED LEARNING, 2019, 35 (04) :476-490
[7]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[8]  
Brasil Ministry of Education, 2018, MEC LIB 100 DO ORC C
[9]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[10]   Data mining for modeling students' performance: A tutoring action plan to prevent academic dropout [J].
Burgos, Concepcion ;
Campanario, Maria L. ;
de la Pena, David ;
Lara, Juan A. ;
Lizcano, David ;
Martinez, Maria A. .
COMPUTERS & ELECTRICAL ENGINEERING, 2018, 66 :541-556