Feedback-Generation for Programming Exercises With GPT-4

被引:6
作者
Azaiz, Imen [1 ]
Kiesler, Natalie [2 ]
Strickroth, Sven [1 ]
机构
[1] Ludwig Maximilians Univ Munchen, Munich, Germany
[2] Nuremberg Tech, Nurnberg, Germany
来源
PROCEEDINGS OF THE 2024 CONFERENCE INNOVATION AND TECHNOLOGY IN COMPUTER SCIENCE EDUCATION, VOL 1, ITICSE 2024 | 2024年
关键词
formative feedback; personalized feedback; assessment; introductory programming; Large Language Models; LLMs; GPT-4; Turbo; benchmarking;
D O I
10.1145/3649217.3653594
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Ever since Large Language Models (LLMs) and related applications have become broadly available, several studies investigated their potential for assisting educators and supporting students in higher education. LLMs such as Codex, GPT-3.5, and GPT 4 have shown promising results in the context of large programming courses, where students can benefit from feedback and hints if provided timely and at scale. This paper explores the quality of GPT-4 Turbo's generated output for prompts containing both the programming task specification and a student's submission as input. Two assignments from an introductory programming course were selected, and GPT-4 was asked to generate feedback for 55 randomly chosen, authentic student programming submissions. The output was qualitatively analyzed regarding correctness, personalization, fault localization, and other features identified in the material. Compared to prior work and analyses of GPT-3.5, GPT-4 Turbo shows notable improvements. For example, the output is more structured and consistent. GPT-4 Turbo can also accurately identify invalid casing in student programs' output. In some cases, the feedback also includes the output of the student program. At the same time, inconsistent feedback was noted such as stating that the submission is correct but an error needs to be fixed. The present work increases our understanding of LLMs' potential, limitations, and how to integrate them into e-assessment systems, pedagogical scenarios, and instructing students who are using applications based on GPT-4.
引用
收藏
页码:31 / 37
页数:7
相关论文
共 50 条
  • [41] Once Upon a GPT-4: Enhancing Diversity in Automated Reading Comprehension Story Generation with Classic Tales
    Shankarnarayanan, Aadhith
    Syed, Taufiq
    Shapsough, Salsabeel
    Zualkernan, Imran
    2024 IEEE INTERNATIONAL CONFERENCE ON ADVANCED LEARNING TECHNOLOGIES, ICALT 2024, 2024, : 196 - 200
  • [42] Using the Retrieval-Augmented Generation Technique to Improve the Performance of GPT-4 in Answering Quran Questions
    Alnefaie, Sarah
    Atwell, Eric
    Alsalka, Mohammed Ammar
    2024 6TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING, ICNLP 2024, 2024, : 377 - 381
  • [43] Case study identification with GPT-4 and implications for mapping studies
    Petersen, Kai
    INFORMATION AND SOFTWARE TECHNOLOGY, 2024, 171
  • [44] Assessing GPT-4's accuracy in answering clinical pharmacological questions on pain therapy
    Stroop, Anna
    Stroop, Tabea
    Alsofy, Samer Zawy
    Wegner, Moritz
    Nakamura, Makoto
    Stroop, Ralf
    BRITISH JOURNAL OF CLINICAL PHARMACOLOGY, 2025,
  • [45] GPT-3.5 Turbo and GPT-4 Turbo in Title and Abstract Screening for Systematic Reviews
    Oami, Takehiko
    Okada, Yohei
    Nakada, Taka-aki
    JMIR MEDICAL INFORMATICS, 2025, 13
  • [46] Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard
    Farhat, Faiza
    Chaudhry, Beenish Moalla
    Nadeem, Mohammad
    Sohail, Shahab Saquib
    Madsen, Dag Oivind
    JMIR MEDICAL EDUCATION, 2024, 10
  • [47] Evaluating GPT-4's Cognitive Functions Through the Bloom Taxonomy: Insights and Clarifications
    Herrmann-Werner, Anne
    Festl-Wietek, Teresa
    Holderried, Friederike
    Herschbach, Lea
    Griewatz, Jan
    Masters, Ken
    Zipfel, Stephan
    Mahling, Moritz
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
  • [48] Revolutionizing Neurosurgery with GPT-4: A Leap Forward or Ethical Conundrum?
    Wenbo Li
    Mingshu Fu
    Siyu Liu
    Hongyu Yu
    Annals of Biomedical Engineering, 2023, 51 : 2105 - 2112
  • [49] Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study
    Takagi, Soshi
    Watari, Takashi
    Erabi, Ayano
    Sakaguchi, Kota
    JMIR MEDICAL EDUCATION, 2023, 9
  • [50] GPT-4再燃热点 拷问科技伦理边界
    张渺
    科学大观园, 2023, (08) : 54 - 57