Educational data mining to predict students' academic performance: A survey study

被引:56
作者
Batool, Saba [1 ]
Rashid, Junaid [2 ]
Nisar, Muhammad Wasif [1 ]
Kim, Jungeun [3 ]
Kwon, Hyuk-Yoon [4 ]
Hussain, Amir [5 ]
机构
[1] COMSATS Univ Islamabad, Dept Comp Sci, Wah Campus, Islamabad, Pakistan
[2] Kongju Natl Univ, Dept Comp Sci & Engn, Cheonan 31080, South Korea
[3] Kongju Natl Univ, Dept Software, Dept Comp Sci & Engn, Cheonan 31080, South Korea
[4] Seoul Natl Univ Sci & Technol, Dept Ind Engn, Seoul, South Korea
[5] Edinburgh Napier Univ, Data Sci & Cyber Analyt Res Grp, Edinburgh EH11 4DY, Midlothian, Scotland
基金
新加坡国家研究基金会;
关键词
Educational data mining; Predictive analysis; Students attributes; ARTIFICIAL NEURAL-NETWORK; EARLY WARNING SYSTEMS; ARCHITECTURE STUDENTS; DROPOUT PREDICTION; GENETIC ALGORITHMS; DECISION TREE; MODEL; ONLINE; COURSES; CLASSIFICATION;
D O I
10.1007/s10639-022-11152-y
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Educational data mining is an emerging interdisciplinary research area involving both education and informatics. It has become an imperative research area due to many advantages that educational institutions can achieve. Along these lines, various data mining techniques have been used to improve learning outcomes by exploring large-scale data that come from educational settings. One of the main problems is predicting the future achievements of students before taking final exams, so we can proactively help students achieve better performance and prevent dropouts. Therefore, many efforts have been made to solve the problem of student performance prediction in the context of educational data mining. In this paper, we provide readers with a comprehensive understanding of student performance prediction and compare approximately 260 studies in the last 20 years with respect to i) major factors highly affecting student performance prediction, ii) kinds of data mining techniques including prediction and feature selection algorithms, and iii) frequently used data mining tools. The findings of the comprehensive analysis show that ANN and Random Forest are mostly used data mining algorithms, while WEKA is found as a trending tool for students' performance prediction. Students' academic records and demographic factors are the best attributes to predict performance. The study proves that irrelevant features in the dataset reduce the prediction results and increase model processing time. Therefore, almost half of the studies used feature selection techniques before building prediction models. This study attempts to provide useful and valuable information to researchers interested in advancing educational data mining. The study directs future researchers to achieve highly accurate prediction results in different scenarios using different available inputs or techniques. The study also helps institutions apply data mining techniques to predict and improve student outcomes by providing additional assistance on time.
引用
收藏
页码:905 / 971
页数:67
相关论文
共 269 条
[61]   Predicting Students Academic Performance Using Support Vector Machine [J].
Burman, Iti ;
Som, Subhranil .
PROCEEDINGS 2019 AMITY INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AICAI), 2019, :756-759
[62]   Orderliness predicts academic performance: behavioural analysis on campus lifestyle [J].
Cao, Yi ;
Gao, Jian ;
Lian, Defu ;
Rong, Zhihai ;
Shi, Jiatu ;
Wang, Qing ;
Wu, Yifan ;
Yao, Huaxiu ;
Zhou, Tao .
JOURNAL OF THE ROYAL SOCIETY INTERFACE, 2018, 15 (146)
[63]  
Cavazos R., 2017, MEXICAN INT C ARTIFI
[64]  
Cevik M., 2020, International Online Journal of Education and Teaching, V7, P1689
[65]  
Chand KSP., 2020, ASSESSMENT ANAL PERF
[66]   Integrating academic type of social media activity with perceived academic performance: A role of task-related and non-task-related compulsive Internet use [J].
Chang, Ching-Ter ;
Tu, Chang-Shu ;
Hajiyev, Jeyhun .
COMPUTERS & EDUCATION, 2019, 139 :157-172
[67]   Student Performance Prediction Model for Early-Identification of At-risk Students in Traditional Classroom Settings [J].
Chanlekha, Hutchatai ;
Niramitranon, Jitti .
PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON MANAGEMENT OF DIGITAL ECOSYSTEMS (MEDES'18), 2018, :239-245
[68]   Identifying at-risk students based on the phased prediction model [J].
Chen, Yan ;
Zheng, Qinghua ;
Ji, Shuguang ;
Tian, Feng ;
Zhu, Haiping ;
Liu, Min .
KNOWLEDGE AND INFORMATION SYSTEMS, 2020, 62 (03) :987-1003
[69]   Square it up! How to model step duration when predicting student performance [J].
Chounta, Irene-Angelica ;
Carvalho, Paulo F. .
PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON LEARNING ANALYTICS & KNOWLEDGE (LAK'19), 2019, :330-334
[70]   Predicting at-risk university students in a virtual learning environment via a machine learning algorithm [J].
Chui, Kwok Tai ;
Fung, Dennis Chun Lok ;
Lytras, Miltiadis D. ;
Lam, Tin Miu .
COMPUTERS IN HUMAN BEHAVIOR, 2020, 107