Assessing the proficiency of large language models in automatic feedback generation: An evaluation study

被引：0

作者：

Dai, Wei ^{[1
]}

Tsai, Yi-Shan ^{[1
]}

Lin, Jionghao ^{[2
]}

Aldino, Ahmad ^{[1
]}

Jin, Hua ^{[1
]}

Li, Tongguang ^{[1
]}

Gašević, Dragan ^{[1
]}

Chen, Guanliang ^{[1
]}

机构：

[1] Monash University, Melbourne

来源：

Computers and Education: Artificial Intelligence | 2024年 / 7卷

关键词：

Assessment feedback; Automated feedback system; Feedback effectiveness; Feedback generation; Generative pre-trained transformer; Learning analytics;

D O I：

10.1016/j.caeai.2024.100299

中图分类号：

学科分类号：

摘要：

Assessment feedback is important to student learning. Learning analytics (LA) powered by artificial intelligence exhibits profound potential in helping instructors with the laborious provision of feedback. Inspired by the recent advancements made by Generative Pre-trained Transformer (GPT) models, we conducted a study to examine the extent to which GPT models hold the potential to advance the existing knowledge of LA-supported feedback systems towards improving the efficiency of feedback provision. Therefore, our study explored the ability of two versions of GPT models – i.e., GPT-3.5 (ChatGPT) and GPT-4 – to generate assessment feedback on students' writing assessment tasks, common in higher education, with open-ended topics for a data science-related course. We compared the feedback generated by GPT models (namely GPT-3.5 and GPT-4) with the feedback provided by human instructors in terms of readability, effectiveness (content containing effective feedback components), and reliability (correct assessment on student performance). Results showed that (1) both GPT-3.5 and GPT-4 were able to generate more readable feedback with greater consistency than human instructors, (2) GPT-4 outperformed GPT-3.5 and human instructors in providing feedback containing information about effective feedback dimensions, including feeding-up, feeding-forward, process level, and self-regulation level, and (3) GPT-4 demonstrated higher reliability of feedback compared to GPT-3.5. Based on our findings, we discussed the potential opportunities and challenges of utilising GPT models in assessment feedback generation. © 2024 The Authors

引用

共 40 条

[1]

Arthars N., Dollinger M., Vigentini L., Liu D.Y.T., Kondo E., King D.M., Empowering teachers to personalize learning support: Case studies of teachers' experiences adopting a student-and teacher-centered learning analytics platform at three Australian universities, Utilizing learning analytics to support study success, pp. 223-248, (2019)

[2]

Aydin O., Karaarslan E., (2022)

[3]

Azcona D., Hsiao I.H., Smeaton A.F., Detecting students-at-risk in computer programming classes with learning analytics from students' digital footprints, User Modeling and User-Adapted Interaction, 29, pp. 759-788, (2019)

[4]

Beckman K., Apps T., Bennett S., Dalgarno B., Kennedy G., Lockyer L., Self-regulation in open-ended online assignment tasks: The importance of initial task interpretation and goal setting, Studies in Higher Education, 46, pp. 821-835, (2021)

[5]

Carless D., Boud D., The development of student feedback literacy: Enabling uptake of feedback, Assessment & Evaluation in Higher Education, 43, pp. 1315-1325, (2018)

[6]

Cavalcanti A.P., Diego A., Mello R.F., Mangaroska K., Nascimento A., Freitas F., Gasevic D., How good is my feedback? A content analysis of written feedback, Proceedings of the 10th international conference on learning analytics & knowledge, pp. 428-437, (2020)

[7]

Chatti M.A., Dyckhoff A.L., Schroeder U., Thus H., A reference model for learning analytics, International Journal of Technology Enhanced Learning, 4, pp. 318-331, (2012)

[8]

Clow D., The learning analytics cycle: Closing the loop effectively, Proceedings of the 2nd international conference on learning analytics and knowledge, pp. 134-138, (2012)

[9]

Dai W., Lin J., Jin F., Li T., Tsai Y.S., Gasevic D., Chen G., Can large language models provide feedback to students? A case study on ChatGPT, Proceedings of the 23rd IEEE international conference on advanced learning technologies, pp. 323-325, (2023)

[10]

Dai W., Tsai Y.S., Fan Y., Gasevic D., Chen G., Measuring inconsistency in written feedback: A case study in politeness, Proceedings of the 23rd international conference of artificial intelligence in education, pp. 560-566, (2022)

← 1 2 3 4 →