Feedback-Generation for Programming Exercises With GPT-4

被引:6
作者
Azaiz, Imen [1 ]
Kiesler, Natalie [2 ]
Strickroth, Sven [1 ]
机构
[1] Ludwig Maximilians Univ Munchen, Munich, Germany
[2] Nuremberg Tech, Nurnberg, Germany
来源
PROCEEDINGS OF THE 2024 CONFERENCE INNOVATION AND TECHNOLOGY IN COMPUTER SCIENCE EDUCATION, VOL 1, ITICSE 2024 | 2024年
关键词
formative feedback; personalized feedback; assessment; introductory programming; Large Language Models; LLMs; GPT-4; Turbo; benchmarking;
D O I
10.1145/3649217.3653594
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Ever since Large Language Models (LLMs) and related applications have become broadly available, several studies investigated their potential for assisting educators and supporting students in higher education. LLMs such as Codex, GPT-3.5, and GPT 4 have shown promising results in the context of large programming courses, where students can benefit from feedback and hints if provided timely and at scale. This paper explores the quality of GPT-4 Turbo's generated output for prompts containing both the programming task specification and a student's submission as input. Two assignments from an introductory programming course were selected, and GPT-4 was asked to generate feedback for 55 randomly chosen, authentic student programming submissions. The output was qualitatively analyzed regarding correctness, personalization, fault localization, and other features identified in the material. Compared to prior work and analyses of GPT-3.5, GPT-4 Turbo shows notable improvements. For example, the output is more structured and consistent. GPT-4 Turbo can also accurately identify invalid casing in student programs' output. In some cases, the feedback also includes the output of the student program. At the same time, inconsistent feedback was noted such as stating that the submission is correct but an error needs to be fixed. The present work increases our understanding of LLMs' potential, limitations, and how to integrate them into e-assessment systems, pedagogical scenarios, and instructing students who are using applications based on GPT-4.
引用
收藏
页码:31 / 37
页数:7
相关论文
共 50 条
  • [31] Will ChatGPT/GPT-4 be a Lighthouse to Guide Spinal Surgeons?
    He, Yongbin
    Tang, Haifeng
    Wang, Dongxue
    Gu, Shuqin
    Ni, Guoxin
    Wu, Haiyang
    ANNALS OF BIOMEDICAL ENGINEERING, 2023, 51 (07) : 1362 - 1365
  • [32] Performance of Novel GPT-4 in Otolaryngology Knowledge Assessment
    Revercomb, Lucy
    Patel, Aman M.
    Fu, Daniel
    Filimonov, Andrey
    INDIAN JOURNAL OF OTOLARYNGOLOGY AND HEAD & NECK SURGERY, 2024, 76 (06) : 6112 - 6114
  • [33] An exploratory assessment of GPT-4o and GPT-4 performance on the Japanese National Dental Examination
    Morishita, Masaki
    Fukuda, Hikaru
    Yamaguchi, Shino
    Muraoka, Kosuke
    Nakamura, Taiji
    Hayashi, Masanari
    Yoshioka, Izumi
    Ono, Kentaro
    Awano, Shuji
    SAUDI DENTAL JOURNAL, 2024, 36 (12) : 1577 - 1581
  • [34] Evaluating the performance of GPT-3.5, GPT-4, and GPT-4o in the Chinese National Medical Licensing Examination
    Luo, Dingyuan
    Liu, Mengke
    Yu, Runyuan
    Liu, Yulian
    Jiang, Wenjun
    Fan, Qi
    Kuang, Naifeng
    Gao, Qiang
    Yin, Tao
    Zheng, Zuncheng
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [35] Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination
    Kaneda, Yudai
    Takahashi, Ryo
    Kaneda, Uiri
    Akashima, Shiori
    Okita, Haruna
    Misaki, Sadaya
    Yamashiro, Akimi
    Ozaki, Akihiko
    Tanimoto, Tetsuya
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (08)
  • [36] Integrating AI in Lipedema Management: Assessing the Efficacy of GPT-4 as a Consultation Assistant
    Leypold, Tim
    Lingens, Lara F.
    Beier, Justus P.
    Boos, Anja M.
    LIFE-BASEL, 2024, 14 (05):
  • [37] Assessing the quality of automatic-generated short answers using GPT-4
    Rodrigues L.
    Dwan Pereira F.
    Cabral L.
    Gašević D.
    Ramalho G.
    Ferreira Mello R.
    Computers and Education: Artificial Intelligence, 2024, 7
  • [38] Performance of GPT-4 Vision on kidney pathology exam questions
    Miao, Jing
    Thongprayoon, Charat
    Cheungpasitporn, Wisit
    Cornell, Lynn D.
    AMERICAN JOURNAL OF CLINICAL PATHOLOGY, 2024, 162 (03) : 220 - 226
  • [39] Performance of GPT-3.5 and GPT-4 on the Korean Pharmacist Licensing Examination: Comparison Study
    Jin, Hye Kyung
    Kim, Eunyoung
    JMIR MEDICAL EDUCATION, 2024, 10
  • [40] Revolutionizing Neurosurgery with GPT-4: A Leap Forward or Ethical Conundrum?
    Li, Wenbo
    Fu, Mingshu
    Liu, Siyu
    Yu, Hongyu
    ANNALS OF BIOMEDICAL ENGINEERING, 2023, 51 (10) : 2105 - 2112