Opportunities of natural language processing for comparative judgment assessment of essays

被引:0
作者
De Vrindt, Michiel [1 ,2 ]
Tack, Anaïs [2 ,3 ]
Van den Noortgate, Wim [1 ,2 ]
Lesterhuis, Marije [4 ]
Bouwer, Renske [5 ]
机构
[1] Faculty of Psychology and Educational Sciences, KU Leuven, Etienne Sabbelaan 53, Kortrijk
[2] itec, an imec research group at KU Leuven, Etienne Sabbelaan 51, Kortrijk
[3] Faculty of Arts, KU Leuven, Etienne Sabbelaan 53, Kortrijk
[4] Center for Research and Development of Health Professions Education, UMC Utrecht, Etienne Sabbelaan 53, Utrecht
[5] Institute for Language Sciences, Utrecht University, Trans 10, Utrecht
来源
Computers and Education: Artificial Intelligence | 2025年 / 8卷
关键词
Automated essay scoring; Comparative judgment; Hybrid human-AI; Natural language processing; Partial-automation;
D O I
10.1016/j.caeai.2025.100414
中图分类号
学科分类号
摘要
Comparative judgment (CJ) is an assessment method commonly used for assessing essay quality, where assessors compare pairs of essays and judge which essays are superior in quality. A psychometric model is used to convert judgments into quality scores. Although CJ yields reliable and valid scores, its widespread implementation in educational practice is hindered by its inefficiency and limited feedback capabilities. This conceptual study explores how Natural Language Processing (NLP) can address these limitations, drawing upon existing NLP techniques and the very limited research on their integration within CJ. More specifically, we argue that, at the start of the assessment, initial essay quality scores could be predicted from essay texts using NLP, mitigating the cold-start problem of CJ. During the CJ assessment, selection rules could be constructed using NLP to efficiently increase the reliability of the scores while supporting assessors by not letting them make too difficult comparisons. After the CJ assessment, NLP could automate feedback, helping to better understand how assessors arrived at their judgments and explaining the scores to assessees (students). To support future research, we overview appropriate methods based on existing research and highlight important considerations for each opportunity. Ultimately, we contend that integrating NLP into CJ can significantly improve the efficiency and transparency of the assessment method, all while preserving the crucial role of human assessors in evaluating writing quality. © 2025 The Author(s)
引用
收藏
相关论文
共 118 条
[1]  
Akata Z., Balliet D., De Rijke M., Dignum F., Dignum V., Eiben G., Fokkens A., Grossi D., Hindriks K., Hoos H., Et al., A research agenda for hybrid intelligence: Augmenting human intellect with collaborative, adaptive, responsible, and explainable artificial intelligence, Computer, 53, pp. 18-28, (2020)
[2]  
AlShaikh F., Hewahi N., AI and machine learning techniques in the development of intelligent tutoring system: A review, 2021 international conference on innovation and intelligence for informatics, computing, and technologies (3ICT), pp. 403-410, (2021)
[3]  
Anseel F., Lievens F., The mediating role of feedback acceptance in the relationship between feedback and attitudinal and performance outcomes, International Journal of Selection and Assessment, 17, pp. 362-376, (2009)
[4]  
Baniya S., Mentzer N., Bartholomew S.R., Chesley A., Moon C., Sherman D., Using adaptive comparative judgment in writing assessment, The Journal of Technology Studies, 45, pp. 24-35, (2019)
[5]  
Bartholomew S.R., Strimel G.J., Yoshikawa E., Using adaptive comparative judgment for student formative feedback and learning during a middle school design project, International Journal of Technology and Design Education, 29, pp. 363-385, (2019)
[6]  
Basu S., Jacobs C., Vanderwende L., Powergrading: A clustering approach to amplify human effort for short answer grading, Transactions of the Association for Computational Linguistics, 1, pp. 391-402, (2013)
[7]  
Beigman Klebanov B., Flor M., Gyawali B., Topicality-based indices for essay scoring, Proceedings of the 11th workshop on innovative use of NLP for building educational applications, pp. 63-72, (2016)
[8]  
Berger-Tal O., Nathan J., Meron E., Saltz D., The exploration-exploitation dilemma: A multidisciplinary framework, PLoS ONE, 9, pp. 1-8, (2014)
[9]  
Bi J., Agreement and reliability assessments for performance of sensory descriptive panel, Journal of Sensory Studies, 18, pp. 61-76, (2003)
[10]  
Bloxham S., Marking and moderation in the UK: False assumptions and wasted resources, Assessment Evaluation Higher Education, 34, pp. 209-220, (2009)