Opportunities of natural language processing for comparative judgment assessment of essays

被引：0

作者：

De Vrindt, Michiel ^{[1
,2
]}

Tack, Anaïs ^{[2
,3
]}

Van den Noortgate, Wim ^{[1
,2
]}

Lesterhuis, Marije ^{[4
]}

Bouwer, Renske ^{[5
]}

机构：

[1] Faculty of Psychology and Educational Sciences, KU Leuven, Etienne Sabbelaan 53, Kortrijk

[2] itec, an imec research group at KU Leuven, Etienne Sabbelaan 51, Kortrijk

[3] Faculty of Arts, KU Leuven, Etienne Sabbelaan 53, Kortrijk

[4] Center for Research and Development of Health Professions Education, UMC Utrecht, Etienne Sabbelaan 53, Utrecht

[5] Institute for Language Sciences, Utrecht University, Trans 10, Utrecht

来源：

Computers and Education: Artificial Intelligence | 2025年 / 8卷

关键词：

Automated essay scoring; Comparative judgment; Hybrid human-AI; Natural language processing; Partial-automation;

D O I：

10.1016/j.caeai.2025.100414

中图分类号：

学科分类号：

摘要：

Comparative judgment (CJ) is an assessment method commonly used for assessing essay quality, where assessors compare pairs of essays and judge which essays are superior in quality. A psychometric model is used to convert judgments into quality scores. Although CJ yields reliable and valid scores, its widespread implementation in educational practice is hindered by its inefficiency and limited feedback capabilities. This conceptual study explores how Natural Language Processing (NLP) can address these limitations, drawing upon existing NLP techniques and the very limited research on their integration within CJ. More specifically, we argue that, at the start of the assessment, initial essay quality scores could be predicted from essay texts using NLP, mitigating the cold-start problem of CJ. During the CJ assessment, selection rules could be constructed using NLP to efficiently increase the reliability of the scores while supporting assessors by not letting them make too difficult comparisons. After the CJ assessment, NLP could automate feedback, helping to better understand how assessors arrived at their judgments and explaining the scores to assessees (students). To support future research, we overview appropriate methods based on existing research and highlight important considerations for each opportunity. Ultimately, we contend that integrating NLP into CJ can significantly improve the efficiency and transparency of the assessment method, all while preserving the crucial role of human assessors in evaluating writing quality. © 2025 The Author(s)

引用

共 118 条

[21]

Coenen T., Coertjens L., Vlerick P., Lesterhuis M., Mortier A.V., Donche V., Ballon P., De Maeyer S., An information system design theory for the comparative judgement of competences, European Journal of Information Systems, 27, pp. 248-261, (2018)

[22]

Cohen J., A coefficient of agreement for nominal scales, Educational and Psychological Measurement, 20, pp. 37-46, (1960)

[23]

Conijn R., Kahr P., Snijders C., The effects of explanations in automated essay scoring systems on student trust and motivation, Journal of Learning Analytics, 10, pp. 37-53, (2023)

[24]

Crompvoets E.A.V., Beguin A.A., Sijtsma K., Adaptive pairwise comparison for educational measurement, Journal of Educational and Behavioral Statistics, 45, pp. 316-338, (2020)

[25]

van Daal T., Making a choice is not easy?! Unravelling the task difficulty of comparative judgement to assess student work, (2020)

[26]

van Daal T., Lesterhuis M., Coertjens L., Donche V., De Maeyer S., Validity of comparative judgement to assess academic writing: Examining implications of its holistic character and building on a shared consensus, Assessment in Education: Principles Policy & Practice, 26, pp. 59-74, (2016)

[27]

van Daal T., Lesterhuis M., Coertjens L., van de Kamp M.T., Donche V., De Maeyer S., The complexity of assessing student work using comparative judgment: The moderating role of decision accuracy, Frontiers in Education, 2, pp. 1-13, (2017)

[28]

De Vrindt M., Van den Noortgate W., Debeer D., Text mining to alleviate the cold-start problem of adaptive comparative judgments, Frontiers in Education, 7, pp. 132-147, (2022)

[29]

De Vrindt M., Tack A., Bouwer R., Van Den Noortgate W., Lesterhuis M., Predicting initial essay quality scores to increase the efficiency of comparative judgment assessments, Proceedings of the 19th workshop on innovative use of NLP for building educational applications (BEA 2024), pp. 125-136, (2024)

[30]

Dikli S., Bleyle S., Automated essay scoring feedback for second language writers: How does it compare to instructor feedback?, Assessing Writing, 22, pp. 1-17, (2014)

← 1 2 3 4 5 6 7 8 9 10 →