Predicting science achievement scores with machine learning algorithms: a case study of OECD PISA 2015-2018 data

被引:9
作者
Acisli-Celik, Sibel [1 ]
Yesilkanat, Cafer Mert [1 ]
机构
[1] Artvin Coruh Univ, Sci Teaching Dept, Artvin, Turkiye
关键词
Artificial intelligence; Interdisciplinary; transdisciplinary; Random forest; XGBoost; SUPPORT VECTOR REGRESSION; RANDOM SUBSPACE METHOD; EDUCATION POLICY; RANDOM FORESTS; STUDENTS; PERFORMANCE; ASSESSMENTS; VARIABLES; LITERACY; MODELS;
D O I
10.1007/s00521-023-08901-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this study, the performance of machine learning methods was examined in terms of predicting the science education achievement scores of the students who took the exam for the next term, PISA 2018, and the science average scores of the countries, using PISA 2015 data. The research sample consists of a total of 67,329 students who took the PISA 2015 exam from 13 randomly selected countries (Brazil, Chinese Taipei, Dominican Republic, Estonia, Finland, Hungary, Italy, Japan, Lithuania, Luxembourg, Peru, Singapore, Turkiye). In this study, multiple linear regression, support vector regression, random forest, and extreme gradient boosting (XGBoost) machine learning algorithms were used. For the machine learning process, a randomly determined part from the PISA-2015 data of each country researched was divided as training data and the remaining part as testing data to evaluate model performance. As a result of the research, it was determined that the XGBoost algorithm showed the best performance in estimating both PISA-2015 test data and PISA-2018 science academic achievement scores in all researched countries. Furthermore, it was determined that the highest PISA-2018 science achievement scores of the students who participated in the exam, estimated by this algorithm, were in Luxembourg (r = 0.600, RMSE = 75.06, MAE = 59.97), while the lowest were in Finland (r = 0.467, RMSE = 79.38, MAE = 63.24). In addition, the average PISA-2018 science scores of the countries were estimated with the XGBoost algorithm, and the average science scores calculated for all the countries studied were estimated with very high accuracy.
引用
收藏
页码:21201 / 21228
页数:28
相关论文
共 94 条
  • [61] An optimized XGBoost method for predicting reservoir porosity using petrophysical logs
    Pan, Shaowei
    Zheng, Zechen
    Guo, Zhi
    Luo, Haining
    [J]. JOURNAL OF PETROLEUM SCIENCE AND ENGINEERING, 2022, 208
  • [62] Pejic Aleksandar, 2021, 2021 IEEE 19th International Symposium on Intelligent Systems and Informatics (SISY), P49, DOI 10.1109/SISY52375.2021.9582522
  • [63] PISA, 2021, PROGRAMME INT STUDEN
  • [64] Newer classification and regression tree techniques: Bagging and random forests for ecological prediction
    Prasad, AM
    Iverson, LR
    Liaw, A
    [J]. ECOSYSTEMS, 2006, 9 (02) : 181 - 199
  • [65] Puah S., 2020, Predicting Students' Academic Performance: A Comparison between Traditional MLR and Machine Learning Methods with PISA 2015, DOI [10.31234/osf.io/2yshm, DOI 10.31234/OSF.IO/2YSHM]
  • [66] R Core Team R., 2021, R LANG ENV STAT COMP, DOI DOI 10.59350/T79XT-TF203
  • [67] Big data analytics for preventive medicine
    Razzak, Muhammad Imran
    Imran, Muhammad
    Xu, Guandong
    [J]. NEURAL COMPUTING & APPLICATIONS, 2020, 32 (09) : 4417 - 4451
  • [68] A graphically based machine learning approach to predict secondary schools performance in Tunisia
    Rebai, Sonia
    Ben Yahia, Fatma
    Essid, Hedi
    [J]. SOCIO-ECONOMIC PLANNING SCIENCES, 2020, 70
  • [69] Identifying profiles of students' school climate perceptions using PISA 2015 data
    Rohatgi, Anubha
    Scherer, Ronny
    [J]. LARGE-SCALE ASSESSMENTS IN EDUCATION, 2020, 8 (01)
  • [70] Tailoring a measurement model of socioeconomic status: Applying the alignment optimization method to 15 years of PISA
    Rolfe, Victoria
    [J]. INTERNATIONAL JOURNAL OF EDUCATIONAL RESEARCH, 2021, 106