Balancing Fined-Tuned Machine Learning Models Between Continuous and Discrete Variables - A Comprehensive Analysis Using Educational Data

被引:3
作者
Drousiotis, Efthyvoulos [1 ]
Pentaliotis, Panagiotis [1 ]
Shi, Lei [2 ]
Cristea, Alexandra, I [2 ]
机构
[1] Univ Liverpool, Dept Elect Engn & Elect, Liverpool, Merseyside, England
[2] Univ Durham, Dept Comp Sci, Durham, England
来源
ARTIFICIAL INTELLIGENCE IN EDUCATION, PT I | 2022年 / 13355卷
关键词
Neural networks; Tree-based algorithms; Educational data mining; Feature engineering; MOOCs;
D O I
10.1007/978-3-031-11644-5_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Along with the exponential increase of students enrolling in MOOCs [26] arises the problem of a high student dropout rate. Researchers worldwide are interested in predicting whether students will drop out of MOOCs to prevent it. This study explores and improves ways of handling notoriously challenging continuous variables datasets, to predict dropout. Importantly, we propose a fair comparison methodology: unlike prior studies and, for the first time, when comparing various models, we use algorithms with the dataset they are intended for, thus `like for like.' We use a time-series dataset with algorithms suited for time-series, and a converted discrete-variables dataset, through feature engineering, with algorithms known to handle discrete variables well. Moreover, in terms of predictive ability, we examine the importance of finding the optimal hyperparameters for our algorithms, in combination with the most effective pre-processing techniques for the data. We show that these much lighter discrete models outperform the time-series models, enabling faster training and testing. This result also holds over fine-tuning of pre-processing and hyperparameter optimisation.
引用
收藏
页码:256 / 268
页数:13
相关论文
共 32 条
[1]   A survey on data-efficient algorithms in big data era [J].
Adadi, Amina .
JOURNAL OF BIG DATA, 2021, 8 (01)
[2]   Predicting MOOCs Dropout Using Only Two Easily Obtainable Features from the First Week's Activities [J].
Alamri, Ahmed ;
Alshehri, Mohammad ;
Cristea, Alexandra ;
Pereira, Filipe D. ;
Oliveira, Elaine ;
Shi, Lei ;
Stewart, Craig .
INTELLIGENT TUTORING SYSTEMS (ITS 2019), 2019, 11528 :163-173
[3]  
Bergstra J, 2012, J MACH LEARN RES, V13, P281
[4]  
Biewald L., 2020, Experiment Tracking with Weights and Biases
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]   BART: BAYESIAN ADDITIVE REGRESSION TREES [J].
Chipman, Hugh A. ;
George, Edward I. ;
McCulloch, Robert E. .
ANNALS OF APPLIED STATISTICS, 2010, 4 (01) :266-298
[7]   Model uncertainty [J].
Clyde, M ;
George, EI .
STATISTICAL SCIENCE, 2004, 19 (01) :81-94
[8]   Early Predictor for Student Success Based on Behavioural and Demographical Indicators [J].
Drousiotis, Efthyvoulos ;
Shi, Lei ;
Maskell, Simon .
INTELLIGENT TUTORING SYSTEMS (ITS 2021), 2021, 12677 :161-172
[9]   Capturing Fairness and Uncertainty in Student Dropout Prediction - A Comparison Study [J].
Drousiotis, Efthyvoulos ;
Pentaliotis, Panagiotis ;
Shi, Lei ;
Cristea, Alexandra, I .
ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2021), PT II, 2021, 12749 :139-144
[10]  
Fernández-Delgado M, 2014, J MACH LEARN RES, V15, P3133