Feedback-Generation for Programming Exercises With GPT-4

被引：6

作者：

Azaiz, Imen ^{[1
]}

Kiesler, Natalie ^{[2
]}

Strickroth, Sven ^{[1
]}

机构：

[1] Ludwig Maximilians Univ Munchen, Munich, Germany

[2] Nuremberg Tech, Nurnberg, Germany

来源：

PROCEEDINGS OF THE 2024 CONFERENCE INNOVATION AND TECHNOLOGY IN COMPUTER SCIENCE EDUCATION, VOL 1, ITICSE 2024 | 2024年

关键词：

formative feedback; personalized feedback; assessment; introductory programming; Large Language Models; LLMs; GPT-4; Turbo; benchmarking;

D O I：

10.1145/3649217.3653594

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Ever since Large Language Models (LLMs) and related applications have become broadly available, several studies investigated their potential for assisting educators and supporting students in higher education. LLMs such as Codex, GPT-3.5, and GPT 4 have shown promising results in the context of large programming courses, where students can benefit from feedback and hints if provided timely and at scale. This paper explores the quality of GPT-4 Turbo's generated output for prompts containing both the programming task specification and a student's submission as input. Two assignments from an introductory programming course were selected, and GPT-4 was asked to generate feedback for 55 randomly chosen, authentic student programming submissions. The output was qualitatively analyzed regarding correctness, personalization, fault localization, and other features identified in the material. Compared to prior work and analyses of GPT-3.5, GPT-4 Turbo shows notable improvements. For example, the output is more structured and consistent. GPT-4 Turbo can also accurately identify invalid casing in student programs' output. In some cases, the feedback also includes the output of the student program. At the same time, inconsistent feedback was noted such as stating that the submission is correct but an error needs to be fixed. The present work increases our understanding of LLMs' potential, limitations, and how to integrate them into e-assessment systems, pedagogical scenarios, and instructing students who are using applications based on GPT-4.

引用

页码：31 / 37

页数：7

共 50 条

[31] Will ChatGPT/GPT-4 be a Lighthouse to Guide Spinal Surgeons?
He, Yongbin
Tang, Haifeng
Wang, Dongxue
Gu, Shuqin
Ni, Guoxin
Wu, Haiyang
ANNALS OF BIOMEDICAL ENGINEERING, 2023, 51 (07) : 1362 - 1365
[32] Performance of Novel GPT-4 in Otolaryngology Knowledge Assessment
Revercomb, Lucy
Patel, Aman M.
Fu, Daniel
Filimonov, Andrey
INDIAN JOURNAL OF OTOLARYNGOLOGY AND HEAD & NECK SURGERY, 2024, 76 (06) : 6112 - 6114
[33] An exploratory assessment of GPT-4o and GPT-4 performance on the Japanese National Dental Examination
Morishita, Masaki
Fukuda, Hikaru
Yamaguchi, Shino
Muraoka, Kosuke
Nakamura, Taiji
Hayashi, Masanari
Yoshioka, Izumi
Ono, Kentaro
Awano, Shuji
SAUDI DENTAL JOURNAL, 2024, 36 (12) : 1577 - 1581
[34] Evaluating the performance of GPT-3.5, GPT-4, and GPT-4o in the Chinese National Medical Licensing Examination
Luo, Dingyuan
Liu, Mengke
Yu, Runyuan
Liu, Yulian
Jiang, Wenjun
Fan, Qi
Kuang, Naifeng
Gao, Qiang
Yin, Tao
Zheng, Zuncheng
SCIENTIFIC REPORTS, 2025, 15 (01):
[35] Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination
Kaneda, Yudai
Takahashi, Ryo
Kaneda, Uiri
Akashima, Shiori
Okita, Haruna
Misaki, Sadaya
Yamashiro, Akimi
Ozaki, Akihiko
Tanimoto, Tetsuya
CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (08)
[36] Integrating AI in Lipedema Management: Assessing the Efficacy of GPT-4 as a Consultation Assistant
Leypold, Tim
Lingens, Lara F.
Beier, Justus P.
Boos, Anja M.
LIFE-BASEL, 2024, 14 (05):
[37] Assessing the quality of automatic-generated short answers using GPT-4
Rodrigues L.
Dwan Pereira F.
Cabral L.
Gašević D.
Ramalho G.
Ferreira Mello R.
Computers and Education: Artificial Intelligence, 2024, 7
[38] Performance of GPT-4 Vision on kidney pathology exam questions
Miao, Jing
Thongprayoon, Charat
Cheungpasitporn, Wisit
Cornell, Lynn D.
AMERICAN JOURNAL OF CLINICAL PATHOLOGY, 2024, 162 (03) : 220 - 226
[39] Performance of GPT-3.5 and GPT-4 on the Korean Pharmacist Licensing Examination: Comparison Study
Jin, Hye Kyung
Kim, Eunyoung
JMIR MEDICAL EDUCATION, 2024, 10
[40] Revolutionizing Neurosurgery with GPT-4: A Leap Forward or Ethical Conundrum?
Li, Wenbo
Fu, Mingshu
Liu, Siyu
Yu, Hongyu
ANNALS OF BIOMEDICAL ENGINEERING, 2023, 51 (10) : 2105 - 2112

← 1 2 3 4 5 →