Missing data analysis using machine learning methods to predict the performance of technical students

被引:3
作者
Melo Junior, Gilberto de [1 ]
Alcala, Symone G. Soares [2 ]
Furriel, Geovanne Pereira [1 ]
Vieira, Silvio L. [1 ]
机构
[1] Univ Fed Goias, Elect & Comp Engn, Goiania, Go, Brazil
[2] Univ Fed Goias, Fac Sci & Technol, Goiania, Go, Brazil
来源
REVISTA BRASILEIRA DE COMPUTACAO APLICADA | 2020年 / 12卷 / 02期
关键词
Missing Data Treatment Methods; Machine Learning; Evaluation of algorithms; CLASSIFICATION;
D O I
10.5335/rbca.v12i2.10565
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Machine learning (ML) has become an emerging technology able to solve problems in many areas, including education, medicine, robotic and aerospace. ML is a specific field of artificial intelligence which designs computational models able to learn from data. However, to develop a ML model, it is necessary to ensure data quality, since real-world data is incomplete, noisy and inconsistent. This paper evaluates state-of-the-art missing data treatment methods using ML algorithms to classify the performance of technical high school students at the Federal Institute of Goias in Brazil. The aim is to provide an efficient computational tool to aid educational performance that allows the educators to verify the student's tendency to fail. The results indicate that ignoring and discarding method outperforms other missing data treatment methods. Moreover, the tests reveal that Sequential Minimal Optimization, Neural Networks and Bagging outperform the other ML algorithms, such as Naive Bayes and Decision tree, in terms of classification accuracy.
引用
收藏
页码:134 / 143
页数:10
相关论文
共 45 条
  • [1] Ahmad MA, 2018, ACM-BCB'18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, P559, DOI [10.1145/3233547.3233667, 10.1109/ICHI.2018.00095]
  • [2] An enhanced J48 classification algorithm for the anomaly intrusion detection systems
    Aljawarneh, Shadi
    Yassein, Muneer Bani
    Aljundi, Mohammed
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 5): : 10549 - 10565
  • [3] [Anonymous], 2015, Int. J. Edu. Mng Eng.
  • [4] [Anonymous], THESIS
  • [5] Ayinde A.Q., 2013, INT J COMPUT SCI ISS, V10, P147
  • [6] Bharti K., 2010, INT J COMPUTER SCI I, V1, P315
  • [7] Landslide Susceptibility Assessment Using Bagging Ensemble Based Alternating Decision Trees, Logistic Regression and J48 Decision Trees Methods: A Comparative Study
    Pham B.T.
    Tien Bui D.
    Prakash I.
    [J]. Geotechnical and Geological Engineering, 2017, 35 (6) : 2597 - 2611
  • [8] Bouckaert R.R., 2008, WEKA MANUAL VERSION
  • [9] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [10] De Boor C., 1978, A practical guide to splines, V27