Is ChatGPT 'ready' to be a learning tool for medical undergraduates and will it perform equally in different subjects? Comparative study of ChatGPT performance in tutorial and case-based learning questions in physiology and biochemistry

被引:4
作者
Luke, W. A. Nathasha V. [1 ,4 ]
Chong, Lee Seow [2 ]
Ban, Kenneth H. [2 ]
Wong, Amanda H. [1 ]
Xiong, Chen Zhi [1 ,3 ]
Shing, Lee Shuh [3 ]
Taneja, Reshma [1 ]
Samarasekera, Dujeepa D. [3 ]
Yap, Celestial T. [1 ,4 ]
机构
[1] Natl Univ Singapore, Yong Loo Lin Sch Med, Dept Physiol, Singapore, Singapore
[2] Natl Univ Singapore, Yong Loo Lin Sch Med, Dept Biochem, Singapore, Singapore
[3] Natl Univ Singapore, Ctr Med Educ, Yong Loo Lin Sch Med, Singapore, Singapore
[4] Natl Univ Singapore, 2 Med Dr, MD 9, Singapore 117593, Singapore
关键词
ChatGPT; GPT-3.5; GPT-4 generative AI (artificial intelligence); LLM (large language model); physiology biochemistry;
D O I
10.1080/0142159X.2024.2308779
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
PurposeGenerative AI will become an integral part of education in future. The potential of this technology in different disciplines should be identified to promote effective adoption. This study evaluated the performance of ChatGPT in tutorial and case-based learning questions in physiology and biochemistry for medical undergraduates. Our study mainly focused on the performance of GPT-3.5 version while a subgroup was comparatively assessed on GPT-3.5 and GPT-4 performances.Materials and methodsAnswers were generated in GPT-3.5 for 44 modified essay questions (MEQs) in physiology and 43 MEQs in biochemistry. Each answer was graded by two independent examiners. Subsequently, a subset of 15 questions from each subject were selected to represent different score categories of the GPT-3.5 answers; responses were generated in GPT-4, and graded.ResultsThe mean score for physiology answers was 74.7 (SD 25.96). GPT-3.5 demonstrated a statistically significant (p = .009) superior performance in lower-order questions of Bloom's taxonomy in comparison to higher-order questions. Deficiencies in the application of physiological principles in clinical context were noted as a drawback. Scores in biochemistry were relatively lower with a mean score of 59.3 (SD 26.9) for GPT-3.5. There was no statistically significant difference in the scores for higher and lower-order questions of Bloom's taxonomy. The deficiencies highlighted were lack of in-depth explanations and precision. The subset of questions where the GPT-4 and GPT-3.5 were compared demonstrated a better overall performance in GPT-4 responses in both subjects. This difference between the GPT-3.5 and GPT-4 performance was statistically significant in biochemistry but not in physiology.ConclusionsThe differences in performance across the two versions, GPT-3.5 and GPT-4 across the disciplines are noteworthy. Educators and students should understand the strengths and limitations of this technology in different fields to effectively integrate this technology into teaching and learning.
引用
收藏
页码:1441 / 1447
页数:7
相关论文
共 20 条
  • [1] Revolutionizing education with AI: Exploring the transformative potential of ChatGPT
    Adiguzel, Tufan
    Kaya, Mehmet Haldun
    Cansu, Fatih Kursat
    [J]. CONTEMPORARY EDUCATIONAL TECHNOLOGY, 2023, 15 (03)
  • [2] Do Large Language Models Understand Us?
    Aguera y Arcas, Blaise
    [J]. DAEDALUS, 2022, 151 (02) : 183 - 197
  • [3] Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments
    Brin, Dana
    Sorin, Vera
    Vaid, Akhil
    Soroush, Ali
    Glicksberg, Benjamin S.
    Charney, Alexander W.
    Nadkarni, Girish
    Klang, Eyal
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01)
  • [4] Choi JH, 2022, J LEGAL EDUC, V71, P387
  • [5] Chatting and cheating: Ensuring academic integrity in the era of ChatGPT
    Cotton, Debby R. E.
    Cotton, Peter A. A.
    Shipway, J. Reuben
    [J]. INNOVATIONS IN EDUCATION AND TEACHING INTERNATIONAL, 2024, 61 (02) : 228 - 239
  • [6] A SWOT analysis of ChatGPT: Implications for educational practice and research
    Farrokhnia, Mohammadreza
    Banihashem, Seyyed Kazem
    Noroozi, Omid
    Wals, Arjen
    [J]. INNOVATIONS IN EDUCATION AND TEACHING INTERNATIONAL, 2024, 61 (03) : 460 - 474
  • [7] Evaluating Academic Answers Generated Using ChatGPT
    Fergus, Suzanne
    Botha, Michelle
    Ostovar, Mehrnoosh
    [J]. JOURNAL OF CHEMICAL EDUCATION, 2023, 100 (04) : 1672 - 1675
  • [8] Evaluating ChatGPT's Ability to Solve Higher-Order Questions on the Competency-Based Medical Education Curriculum in Medical Biochemistry
    Ghosh, Arindam
    Bir, Aritri
    [J]. CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (04)
  • [9] Graham Flora, 2022, Nature, DOI 10.1038/d41586-022-04437-2
  • [10] ChatGPT for good? On opportunities and challenges of large language models for education
    Kasneci, Enkelejda
    Sessler, Kathrin
    Kuechemann, Stefan
    Bannert, Maria
    Dementieva, Daryna
    Fischer, Frank
    Gasser, Urs
    Groh, Georg
    Guennemann, Stephan
    Huellermeier, Eyke
    Krusche, Stepha
    Kutyniok, Gitta
    Michaeli, Tilman
    Nerdel, Claudia
    Pfeffer, Juergen
    Poquet, Oleksandra
    Sailer, Michael
    Schmidt, Albrecht
    Seidel, Tina
    Stadler, Matthias
    Weller, Jochen
    Kuhn, Jochen
    Kasneci, Gjergji
    [J]. LEARNING AND INDIVIDUAL DIFFERENCES, 2023, 103