Feedback-Generation for Programming Exercises With GPT-4

被引:6
|
作者
Azaiz, Imen [1 ]
Kiesler, Natalie [2 ]
Strickroth, Sven [1 ]
机构
[1] Ludwig Maximilians Univ Munchen, Munich, Germany
[2] Nuremberg Tech, Nurnberg, Germany
来源
PROCEEDINGS OF THE 2024 CONFERENCE INNOVATION AND TECHNOLOGY IN COMPUTER SCIENCE EDUCATION, VOL 1, ITICSE 2024 | 2024年
关键词
formative feedback; personalized feedback; assessment; introductory programming; Large Language Models; LLMs; GPT-4; Turbo; benchmarking;
D O I
10.1145/3649217.3653594
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Ever since Large Language Models (LLMs) and related applications have become broadly available, several studies investigated their potential for assisting educators and supporting students in higher education. LLMs such as Codex, GPT-3.5, and GPT 4 have shown promising results in the context of large programming courses, where students can benefit from feedback and hints if provided timely and at scale. This paper explores the quality of GPT-4 Turbo's generated output for prompts containing both the programming task specification and a student's submission as input. Two assignments from an introductory programming course were selected, and GPT-4 was asked to generate feedback for 55 randomly chosen, authentic student programming submissions. The output was qualitatively analyzed regarding correctness, personalization, fault localization, and other features identified in the material. Compared to prior work and analyses of GPT-3.5, GPT-4 Turbo shows notable improvements. For example, the output is more structured and consistent. GPT-4 Turbo can also accurately identify invalid casing in student programs' output. In some cases, the feedback also includes the output of the student program. At the same time, inconsistent feedback was noted such as stating that the submission is correct but an error needs to be fixed. The present work increases our understanding of LLMs' potential, limitations, and how to integrate them into e-assessment systems, pedagogical scenarios, and instructing students who are using applications based on GPT-4.
引用
收藏
页码:31 / 37
页数:7
相关论文
共 50 条
  • [1] Leveraging Lecture Content for Improved Feedback: Explorations with GPT-4 and Retrieval Augmented Generation
    Jacobs, Sven
    Jaschke, Steffen
    2024 36TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING EDUCATION AND TRAINING, CSEE & T 2024, 2024,
  • [2] GPT-4 as a biomedical simulator
    Schaefer M.
    Reichl S.
    ter Horst R.
    Nicolas A.M.
    Krausgruber T.
    Piras F.
    Stepper P.
    Bock C.
    Samwald M.
    Computers in Biology and Medicine, 2024, 178
  • [3] Automating Human Tutor-Style Programming Feedback: Leveraging GPT-4 Tutor Model for Hint Generation and GPT-3.5 Student Model for Hint Validation
    Phung, Tung
    Padurean, Victor-Alexandru
    Singh, Anjali
    Brooks, Christopher
    Cambronero, Jose
    Gulwani, Sumit
    FOURTEENTH INTERNATIONAL CONFERENCE ON LEARNING ANALYTICS & KNOWLEDGE, LAK 2024, 2024, : 12 - 23
  • [4] Using GPT-4 to Provide Tiered, Formative Code Feedback
    Ha Nguyen
    Allan, Vicki
    PROCEEDINGS OF THE 55TH ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, SIGCSE 2024, VOL. 1, 2024, : 958 - 964
  • [5] GPT-4 passes the bar exam
    Katz, Daniel Martin
    Bommarito, Michael James
    Gao, Shang
    Arredondo, Pablo
    PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2024, 382 (2270):
  • [6] A Comparative Study of AI-Generated (GPT-4) and Human-crafted MCQs in Programming Education
    Doughty, Jacob
    Wan, Zipiao
    Bompelli, Anishka
    Qayum, Jubahed
    Wang, Taozhi
    Zhang, Juran
    Zheng, Yujia
    Doyle, Aidan
    Sridhar, Pragnya
    Agarwal, Arav
    Bogart, Christopher
    Keylor, Eric
    Kultur, Can
    Savelka, Jaromir
    Sakr, Majd
    PROCEEDINGS OF THE 26TH AUSTRALASIAN COMPUTING EDUCATION CONFERENCE, ACE 2024, 2024, : 114 - 123
  • [7] Prompting GPT-4 to support automatic safety case generation
    Sivakumar, Mithila
    Belle, Alvine B.
    Shan, Jinjun
    Shahandashti, Kimya Khakzad
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
  • [8] Is GPT-4 a reliable rater? Evaluating consistency in GPT-4's text ratings
    Hackl, Veronika
    Mueller, Alexandra Elena
    Granitzer, Michael
    Sailer, Maximilian
    FRONTIERS IN EDUCATION, 2023, 8
  • [9] GPT-4 Turbo with Vision fails to outperform text-only GPT-4 Turbo in the Japan Diagnostic Radiology Board Examination
    Hirano, Yuichiro
    Hanaoka, Shouhei
    Nakao, Takahiro
    Miki, Soichiro
    Kikuchi, Tomohiro
    Nakamura, Yuta
    Nomura, Yukihiro
    Yoshikawa, Takeharu
    Abe, Osamu
    JAPANESE JOURNAL OF RADIOLOGY, 2024, 42 (08) : 918 - 926
  • [10] OpenAI正式发布GPT-4
    唐琳
    科学新闻, 2024, 26 (01) : 27 - 27