Large language models (LLMs) in radiology exams for medical students: Performance and consequences

被引:0
作者
Gotta, Jennifer [1 ]
Hong, Quang Anh Le [1 ]
Koch, Vitali [1 ]
Gruenewald, Leon D. [1 ]
Geyer, Tobias [2 ]
Martin, Simon S. [1 ]
Scholtz, Jan-Erik [1 ]
Booz, Christian [1 ]
Dos Santos, Daniel Pinto [1 ]
Mahmoudi, Scherwin [1 ]
Eichler, Katrin [1 ]
Gruber-Rouh, Tatjana [1 ]
Hammerstingl, Renate [1 ]
Biciusca, Teodora [1 ]
Juergens, Lisa Joy [1 ]
Hoehne, Elena [1 ]
Mader, Christoph [1 ]
Vogl, Thomas J. [1 ]
Reschke, Philipp [1 ]
机构
[1] Goethe Univ Frankfurt, Dept Diagnost & Intervent Radiol, Frankfurt, Germany
[2] Rostock Univ, Med Ctr, Inst Diagnost & Intervent Radiol, Pediat Radiol & Neuroradiol, Rostock, Germany
来源
ROFO-FORTSCHRITTE AUF DEM GEBIET DER RONTGENSTRAHLEN UND DER BILDGEBENDEN VERFAHREN | 2024年
关键词
AI; medical; education;
D O I
10.1055/a-2437-2067
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Purpose The evolving field of medical education is being shaped by technological advancements, including the integration of Large Language Models (LLMs) like ChatGPT. These models could be invaluable resources for medical students, by simplifying complex concepts and enhancing interactive learning by providing personalized support. LLMs have shown impressive performance in professional examinations, even without specific domain training, making them particularly relevant in the medical field. This study aims to assess the performance of LLMs in radiology examinations for medical students, thereby shedding light on their current capabilities and implications. Materials and Methods This study was conducted using 151 multiple-choice questions, which were used for radiology exams for medical students. The questions were categorized by type and topic and were then processed using OpenAI's GPT-3.5 and GPT- 4 via their API, or manually put into Perplexity AI with GPT-3.5 and Bing. LLM performance was evaluated overall, by question type and by topic. Results GPT-3.5 achieved a 67.6% overall accuracy on all 151 questions, while GPT-4 outperformed it significantly with an 88.1% overall accuracy (p<0.001). GPT-4 demonstrated superior performance in both lower-order and higher-order questions compared to GPT-3.5, Perplexity AI, and medical students, with GPT-4 particularly excelling in higher-order questions. All GPT models would have successfully passed the radiology exam for medical students at our university. Conclusion In conclusion, our study highlights the potential of LLMs as accessible knowledge resources for medical students. GPT-4 performed well on lower-order as well as higher-order questions, making ChatGPT-4 a potentially very useful tool for reviewing radiology exam questions. Radiologists should be aware of ChatGPT's limitations, including its tendency to confidently provide incorrect responses.
引用
收藏
页数:11
相关论文
共 17 条
  • [1] Artificial Hallucinations in ChatGPT: Implications in Scientific Writing
    Alkaissi, Hussam
    McFarlane, Samy I.
    [J]. CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (02)
  • [2] Performance of ChatGPT on a Radiology Board-style Examination: Insights into Current Strengths and Limitations
    Bhayana, Rajesh
    Krishna, Satheesh
    Bleakney, Robert R.
    [J]. RADIOLOGY, 2023, 307 (05)
  • [3] Choi JH, 2022, J LEGAL EDUC, V71, P387
  • [4] Gilson Aidan, 2023, JMIR Med Educ, V9, pe45312, DOI 10.2196/45312
  • [5] Artificial intelligence in radiology
    Hosny, Ahmed
    Parmar, Chintan
    Quackenbush, John
    Schwartz, Lawrence H.
    Aerts, Hugo J. W. L.
    [J]. NATURE REVIEWS CANCER, 2018, 18 (08) : 500 - 510
  • [6] Jeblick K., 2022, arXiv, DOI DOI 10.48550/ARXIV.2212.14882
  • [7] Kung TH, 2023, PLOS DIGIT HEALTH, V2, DOI 10.1371/journal.pdig.0000198
  • [8] Utility of ChatGPT in Clinical Practice
    Liu, Jialin
    Wang, Changyu
    Liu, Siru
    [J]. JOURNAL OF MEDICAL INTERNET RESEARCH, 2023, 25
  • [9] Measuring consumer-perceived humanness of online organizational agents
    Lu, Lincoln
    McDonald, Casey
    Kelleher, Tom
    Lee, Susanna
    Chung, Yoo Jin
    Mueller, Sophia
    Vielledent, Marc
    Yue, Cen April
    [J]. COMPUTERS IN HUMAN BEHAVIOR, 2022, 128
  • [10] Checklist for Artificial Intelligence in Medical Imaging (CLAIM): A Guide for Authors and Reviewers
    Mongan, John
    Moy, Linda
    Kahn, Charles E., Jr.
    [J]. RADIOLOGY-ARTIFICIAL INTELLIGENCE, 2020, 2 (02)