A Language Model-Powered Simulated Patient With AutomatedFeedback for History Taking:Prospective Study

被引:5
作者
Holderried, Friederike [1 ]
Stegemann-Philipps, Christian [1 ]
Herrmann-Werner, Anne [1 ]
Festl-Wietek, Teresa [1 ]
Holderried, Martin [2 ]
Eickhoff, Carsten [3 ]
Mahling, Moritz [1 ,2 ]
机构
[1] Univ Tubingen, Tubingen Inst Med Educ TIME, Med Fac, Elfriede Aulhorn Str 10, D-72076 Tubingen, Germany
[2] Univ Hosp Tubingen, Dept Med Dev Proc & Qual Management, Tubingen, Germany
[3] Univ Tubingen, Inst Appl Med Informat, Tubingen, Germany
来源
JMIR MEDICAL EDUCATION | 2024年 / 10卷
关键词
virtual patients communication; communication skills; technology enhanced education; TEL; medical education; ChatGPT; GPT:LLM; LLMs; NLP; natural language processing; machine learning; artificial intelligence; language model; language models; communication; relationship; relationships; chatbot; chatbots; conversational agent; conversational agents; history; histories; simulated; student; students; interaction; interactions; COMMUNICATION-SKILLS; PHYSICAL-EXAMINATION; PERFORMANCE; FEEDBACK; LEARNERS;
D O I
10.2196/59213
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Background: Although history taking is fundamental for diagnosing medical conditions, teaching and providing feedback onthe skill can be challenging due to resource constraints. Virtual simulated patients and web-based chatbots have thus emerged aseducational tools, with recent advancements in artificial intelligence (AI) such as large language models (LLMs) enhancing theirrealism and potential to provide feedback. Objective: In our study, we aimed to evaluate the effectiveness of a Generative Pretrained Transformer (GPT) 4 model toprovide structured feedback on medical students'performance in history taking with a simulated patient. Methods: We conducted a prospective study involving medical students performing history taking with a GPT-powered chatbot.To that end, we designed a chatbot to simulate patients'responses and provide immediate feedback on the comprehensivenessof the students'history taking. Students'interactions with the chatbot were analyzed, and feedback from the chatbot was comparedwith feedback from a human rater. We measured interrater reliability and performed a descriptive analysis to assess the qualityof feedback.Results: Most of the study's participants were in their third year of medical school. A total of 1894 question-answer pairs from106 conversations were included in our analysis. GPT-4's role-play and responses were medically plausible in more than 99%of cases. Interrater reliability between GPT-4 and the human rater showed "almost perfect" agreement (Cohen kappa=0.832). Lessagreement (kappa<0.6) detected for 8 out of 45 feedback categories highlighted topics about which the model's assessments wereoverly specific or diverged from human judgement. Conclusions: The GPT model was effective in providing structured feedback on history-taking dialogs provided by medicalstudents. Although we unraveled some limitations regarding the specificity of feedback for certain feedback categories, the overallhigh agreement with human raters suggests that LLMs can be a valuable tool for medical education. Our findings, thus, advocatethe careful integration of AI-driven feedback mechanisms in medical training and highlight important aspects when LLMs areused in that context.
引用
收藏
页数:14
相关论文
共 47 条
  • [1] Performance of ChatGPT and GPT-4 on Neurosurgery Written Board Examinations
    Ali, Rohaid
    Tang, Oliver Y.
    Connolly, Ian D.
    Sullivan, Patricia L. Zadnik
    Shin, John H.
    Fridley, Jared S.
    Asaad, Wael F.
    Cielo, Deus
    Oyelese, Adetokunbo A.
    Doberstein, Curtis E.
    Gokaslan, Ziya L.
    Telfeian, Albert E.
    [J]. NEUROSURGERY, 2023, 93 (06) : 1353 - 1365
  • [2] [Anonymous], Gpt-4
  • [3] Generative Artificial Intelligence as a Tool for Teaching Communication in Nutrition and Dietetics Education-A Novel Education Innovation
    Barker, Lisa A.
    Moore, Joel D.
    Cook, Helmy A.
    [J]. NUTRIENTS, 2024, 16 (07)
  • [4] Performance of ChatGPT on a Radiology Board-style Examination: Insights into Current Strengths and Limitations
    Bhayana, Rajesh
    Krishna, Satheesh
    Bleakney, Robert R.
    [J]. RADIOLOGY, 2023, 307 (05)
  • [5] Feedback for Learners in Medical Education: What Is Known? A Scoping Review
    Bing-You, Robert
    Hayes, Victoria
    Varaklis, Kalli
    Trowbridge, Robert
    Kemp, Heather
    McKelvy, Dina
    [J]. ACADEMIC MEDICINE, 2017, 92 (09) : 1346 - 1354
  • [6] Developing an effective and comprehensive communication curriculum for undergraduate medical education in Poland - the review and recommendations
    Borowczyk, Martyna
    Stalmach-Przygoda, Agata
    Doroszewska, Antonina
    Libura, Maria
    Chojnacka-Kuras, Marta
    Malecki, Lukasz
    Kowalski, Zbigniew
    Jankowska, Aldona K.
    [J]. BMC MEDICAL EDUCATION, 2023, 23 (01)
  • [7] Bowman Samuel R., 2023, ARXIV
  • [8] BIAS, PREVALENCE AND KAPPA
    BYRT, T
    BISHOP, J
    CARLIN, JB
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 1993, 46 (05) : 423 - 429
  • [9] Utilizing OpenAI's GPT-4 for written feedback
    Carlson, Makenna
    Pack, Austin
    Escalante, Juan
    [J]. TESOL JOURNAL, 2024, 15 (02)
  • [10] Cavalcanti A. P., 2021, Computers and Education: Artificial Intelligence, V2, P100027, DOI [DOI 10.1016/J.CAEAI.2021.100027, 10.1016/j.caeai.2021.100027]