Combining machine translation and automated scoring in international large-scale assessments

被引:1
作者
Jung, Ji Yoon [1 ]
Tyack, Lillian [1 ]
von Davier, Matthias [1 ]
机构
[1] Boston Coll, TIMSS & PIRLS Int Study Ctr, 188 Beacon St, Chestnut Hill, MA 02467 USA
关键词
Automated scoring; Artificial intelligence; Artificial neural networks; Machine translation; Google translate; ChatGPT; International large-scale assessments; TIMSS;
D O I
10.1186/s40536-024-00199-7
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Background Artificial intelligence (AI) is rapidly changing communication and technology-driven content creation and is also being used more frequently in education. Despite these advancements, AI-powered automated scoring in international large-scale assessments (ILSAs) remains largely unexplored due to the scoring challenges associated with processing large amounts of multilingual responses. However, due to their low-stakes nature, ILSAs are an ideal ground for innovations and exploring new methodologies.Methods This study proposes combining state-of-the-art machine translations (i.e., Google Translate & ChatGPT) and artificial neural networks (ANNs) to mitigate two key concerns of human scoring: inconsistency and high expense. We applied AI-based automated scoring to multilingual student responses from eight countries and six different languages, using six constructed response items from TIMSS 2019.Results Automated scoring displayed comparable performance to human scoring, especially when the ANNs were trained and tested on ChatGPT-translated responses. Furthermore, psychometric characteristics derived from machine scores generally exhibited similarity to those obtained from human scores. These results can be considered as supportive evidence for the validity of automated scoring for survey assessments.Conclusions This study highlights that automated scoring integrated with the recent machine translation holds great promise for consistent and resource-efficient scoring in ILSAs.
引用
收藏
页数:18
相关论文
共 54 条
  • [1] State-of-the-art in artificial neural network applications: A survey
    Abiodun, Oludare Isaac
    Jantan, Aman
    Omolara, Abiodun Esther
    Dada, Kemi Victoria
    Mohamed, Nachaat AbdElatif
    Arshad, Humaira
    [J]. HELIYON, 2018, 4 (11)
  • [2] Attali Y., 2013, HDB AUTOMATED ESSAY, P181
  • [3] Balahur A., 2012, Association for Computational Linguistics, P52
  • [4] Bennett R.E., 2016, NCME Applications of Educational Measurement and Assessment Series, P142, DOI [10.4324/9781315871493-8, DOI 10.4324/9781315871493-8]
  • [5] Bennett R.E., 1998, Educational Measurement: Issues and Practice, V17, P9, DOI [10.1111/j.1745-3992.1998.tb00631.x, DOI 10.1111/J.1745-3992.1998.TB00631.X]
  • [6] Bennett R.E., 1991, ETS Res. Rep. Ser, V1991, pi, DOI [10.1002/j.2333-8504.1991.tb01429.x, DOI 10.1002/J.2333-8504.1991.TB01429.X]
  • [7] Berrar D., 2019, Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics. Vols, P542, DOI DOI 10.1016/B978-0-12-809633-8.20349-X
  • [8] Using Deep Learning to Count Albatrosses from Space: Assessing Results in Light of Ground Truth Uncertainty
    Bowler, Ellen
    Fretwell, Peter T.
    French, Geoffrey
    Mackiewicz, Michal
    [J]. REMOTE SENSING, 2020, 12 (12)
  • [9] Britz D, 2017, Arxiv, DOI [arXiv:1703.03906, 10.48550/arXiv.1703.03906]
  • [10] Cahill A., 2020, Handbook of automated scoring: Theory into practice, P69