Assessing ChatGPT's Mastery of Bloom's Taxonomy Using Psychosomatic Medicine Exam Questions: Mixed-Methods Study

被引:16
|
作者
Herrmann-Werner, Anne [1 ,2 ]
Festl-Wietek, Teresa [1 ]
Holderried, Friederike [1 ,3 ]
Herschbach, Lea [1 ]
Griewatz, Jan [1 ]
Masters, Ken [4 ]
Zipfel, Stephan [2 ]
Mahling, Moritz [1 ,5 ]
机构
[1] Univ Tubingen, Tubingen Inst Med Educ, Fac Med, Elfriede Aulhorn Str 10, D-72076 Tubingen, Germany
[2] Univ Hosp Tubingen, Dept Psychosomat Med & Psychotherapy, Tubingen, Germany
[3] Univ Hosp Tubingen, Univ Dept Anesthesiol & Intens Care Med, Tubingen, Germany
[4] Sultan Qaboos Univ, Coll Med & Hlth Sci, Med Educ & Informat Dept, Muscat, Oman
[5] Univ Hosp Tubingen, Dept Diabetol Endocrinol Nephrol, Sect Nephrol & Hypertens, Tubingen, Germany
关键词
answer; artificial intelligence; assessment; Bloom's taxonomy; ChatGPT; classification; error; exam; examination; generative; GPT-4; Generative Pre-trained Transformer 4; language model; learning outcome; LLM; MCQ; medical education; medical exam; multiple-choice question; natural language processing; NLP; psychosomatic; question; response; taxonomy; EDUCATION;
D O I
10.2196/52113
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Large language models such as GPT-4 (Generative Pre-trained Transformer 4) are being increasingly used in medicine and medical education. However, these models are prone to "hallucinations" (ie, outputs that seem convincing while being factually incorrect). It is currently unknown how these errors by large language models relate to the different cognitive levels defined in Bloom's taxonomy. Objective: This study aims to explore how GPT-4 performs in terms of Bloom's taxonomy using psychosomatic medicine exam questions. Methods: We used a large data set of psychosomatic medicine multiple-choice questions (N=307) with real-world results derived from medical school exams. GPT-4 answered the multiple-choice questions using 2 distinct prompt versions: detailed and short. The answers were analyzed using a quantitative approach and a qualitative approach. Focusing on incorrectly answered questions, we categorized reasoning errors according to the hierarchical framework of Bloom's taxonomy. Results: GPT-4's performance in answering exam questions yielded a high success rate: 93% (284/307) for the detailed prompt and 91% (278/307) for the short prompt. Questions answered correctly by GPT-4 had a statistically significant higher difficulty than questions answered incorrectly (P=.002 for the detailed prompt and P<.001 for the short prompt). Independent of the prompt, GPT-4's lowest exam performance was 78.9% (15/19), thereby always surpassing the "pass" threshold. Our qualitative analysis of incorrect answers, based on Bloom's taxonomy, showed that errors were primarily in the "remember" (29/68) and "understand" (23/68) cognitive levels; specific issues arose in recalling details, understanding conceptual relationships, and adhering to standardized guidelines. Conclusions: GPT-4 demonstrated a remarkable success rate when confronted with psychosomatic medicine multiple-choice exam questions, aligning with previous findings. When evaluated through Bloom's taxonomy, our data revealed that GPT-4 occasionally ignored specific facts (remember), provided illogical reasoning (understand), or failed to apply concepts to a new situation (apply). These errors, which were confidently presented, could be attributed to inherent model biases and the tendency to generate outputs that maximize likelihood.
引用
收藏
页数:13
相关论文
共 34 条
  • [21] Native Mexican Parents' Beliefs About Children's Literacy and Language Development: A Mixed-Methods Study
    Gonzalez, Jorge E.
    Bengochea, Alain
    Justice, Laura
    Yeomans-Maldonado, Gloria
    McCormick, Anita
    EARLY EDUCATION AND DEVELOPMENT, 2019, 30 (02): : 259 - 279
  • [22] Medical practitioner's adoption of intelligent clinical diagnostic decision support systems: A mixed-methods study
    Prakash, Ashish Viswanath
    Das, Saini
    INFORMATION & MANAGEMENT, 2021, 58 (07)
  • [23] Students' valued capabilities and their sociodemographic determinants at China's vocational high schools: a mixed-methods study
    Shi, Yi
    Green, Francis
    JOURNAL OF VOCATIONAL EDUCATION AND TRAINING, 2025,
  • [24] Young People's Knowledge of Antibiotics and Vaccinations and Increasing This Knowledge Through Gaming: Mixed-Methods Study Using e-Bug
    Eley, Charlotte Victoria
    Young, Vicki Louise
    Hayes, Catherine Victoria
    Verlander, Neville Q.
    McNulty, Cliodna Ann Miriam
    JMIR SERIOUS GAMES, 2019, 7 (01):
  • [25] From Clinical Practice to Academic Student Instruction: Understanding the Clinical Instructor's Perspective Using a Mixed-Methods Approach
    Swart, Ruth
    Hall, Marc
    CANADIAN JOURNAL OF NURSING RESEARCH, 2021, 53 (02) : 114 - 123
  • [26] Understanding a community's needs for an emergency department-based childhood injury prevention programme: a mixed-methods study
    Yusuf, Hamzah Majid
    Rosenthal, Efrat
    Kornblith, Aaron
    Sowar, Christine
    Del Toro, Rigoberto
    Chen, Carol C.
    INJURY PREVENTION, 2024, 30 (03) : 200 - 205
  • [27] The doctor's presence created a safe space - a mixed methods study of students' learning outcomes from an elective course in palliative medicine
    Schaufel, Margrethe Aase
    Rosland, Jan Henrik
    Haugen, Dagny Faksvag
    BMC MEDICAL EDUCATION, 2024, 24 (01)
  • [28] "I believe it will not get worse": A mixed-methods longitudinal study about patient's perspective of recently developed patellofemoral pain
    Del Priore, Liliam Barbuglio
    Briani, Ronaldo V.
    Waiteman, Marina C.
    Farinelli, Lucca Andre Liporoni Bego
    da Silva, Gleison Gustavo Moraes
    da Silva, Theo Muniz de Souza Borges
    Glaviano, Neal
    de Azevedo, Fabio M.
    PHYSICAL THERAPY IN SPORT, 2024, 70 : 29 - 35
  • [29] Brazilian women's use of evidence-based practices in childbirth after participating in the Senses of Birth intervention: A mixed-methods study
    Fernandes, Luisa da Matta Machado
    Lansky, Sonia
    Passos, Hozana Reis
    Bozlak, Christine T.
    Shaw, Benjamin A.
    PLOS ONE, 2021, 16 (04):
  • [30] See One, Do One, Improve One's Wellness Resident Autonomy in US General Surgery Programs, A Mixed-methods Study
    Abahuje, Egide
    Smith, Kathryn S.
    Amortegui, Daniela
    Eng, Joshua S.
    Philbin, Sarah E.
    Verma, Rhea
    Dastoor, Jehannaz Dinyar
    Schlick, Cary
    Ma, Meixi
    Mackiewicz, Natalia, I
    Choi, Jennifer Nicole
    Greenberg, Jacob
    Johnson, Julie
    Bilimoria, Karl Y.
    Hu, Yue-Yung
    ANNALS OF SURGERY, 2023, 278 (06) : 1045 - 1052