Early Prediction of At-Risk Students in Secondary Education: A Countrywide K-12 Learning Analytics Initiative in Uruguay

被引:10
作者
Queiroga, Emanuel Marques [1 ,2 ]
Batista Machado, Matheus Francisco [3 ]
Paragarino, Virginia Rodes [4 ]
Primo, Tiago Thompsen [2 ]
Cechinel, Cristian [2 ,3 ]
机构
[1] IFSul, Inst Fed Rio Grande do Sul, BR-96015560 Pelotas, RS, Brazil
[2] Univ Fed Pelotas UFPel, Ctr Desenvolvimento Tecnol CDTEC, BR-96010610 Pelotas, RS, Brazil
[3] Univ Fed Santa Catarina UFSC, Ctr Ciencias Tecnol & Saude CTS, BR-88906072 Ararangua, Brazil
[4] Univ Republica, Comis Sectorial Ensenanza, Udelar, Montevideo 11200, Uruguay
关键词
classification; educational strategies; secondary education; learning analytics; at-risk prediction; dropout prediction; bias analysis; fairness in machine learning; DROPOUT PREDICTION; COURSES; DRIVERS; GENDER;
D O I
10.3390/info13090401
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes a nationwide learning analytics initiative in Uruguay focused on the future implementation of governmental policies to mitigate student retention and dropouts in secondary education. For this, data from a total of 258,440 students were used to generate automated models to predict students at risk of failure or dropping out. Data were collected from primary and secondary education from different sources and for the period between 2015 and 2020. Such data contains demographic information about the students and their trajectories from the first grade of primary school to the second grade of secondary school (e.g., student assessments in different subjects over the years, the amount of absences, participation in social welfare programs, and the zone of the school, among other factors). Predictive models using the random forest algorithm were trained, and their performances were evaluated with F1-Macro and AUROC measures. The models were planned to be applied in different periods of the school year for the regular secondary school and for the technical secondary school ((before the beginning of the school year and after the first evaluation meeting for each grade). A total of eight predictive models were developed considering this temporal approach, and after an analysis of bias considering three protected attributes (gender, school zone, and social welfare program participation), seven of them were approved to be used for prediction. The models achieved outstanding performances according to the literature, with an AUROC higher than 0.90 and F1-Macro higher than 0.88. This paper describes in depth the characteristics of the data gathered, the specifics of data preprocessing, and the methodology followed for model generation and bias analysis, together with the architecture developed for the deployment of the predictive models. Among other findings, the results of the paper corroborate the importance given in the literature of using the previous performances of the students in order to predict their future performances.
引用
收藏
页数:25
相关论文
共 79 条
[1]   Prediction of Student's performance by modelling small dataset size [J].
Abu Zohair, Lubna Mahmoud .
INTERNATIONAL JOURNAL OF EDUCATIONAL TECHNOLOGY IN HIGHER EDUCATION, 2019, 16 (01)
[2]   Educational data mining and learning analytics for 21st century higher education: A review and synthesis [J].
Aldowah, Hanan ;
Al-Samarraie, Hosam ;
Fauzy, Wan Mohamad .
TELEMATICS AND INFORMATICS, 2019, 37 :13-49
[3]  
[Anonymous], 1990, Time Series: A Biostatistical Introduction, DOI DOI 10.1002/sim.3429
[4]  
[Anonymous], 2014, LEARNING ANAL RES PR, DOI DOI 10.1007/978-1-4614-3305-7_4
[5]  
[Anonymous], 2015, REV BRASILEIRA INFOR
[6]  
[Anonymous], 2017, Informe sobre el estado de la educacion en Uruguay 2015-2016
[7]  
[Anonymous], P 2 INT C LEARN AN K, DOI 10.1145/2330601.2330661
[8]  
[Anonymous], 2005, PROSPECTS, DOI DOI 10.1007/S11125-005-6816-X
[9]  
[Anonymous], 2007, Educ. Rev
[10]  
Arias Ortiz E., 2021, CAMINO HACIA INCLUSI, DOI [10.18235/0003455, DOI 10.18235/0003455]