Large language models for generating medical examinations: systematic review

被引:12
|
作者
Artsi, Yaara [1 ]
Sorin, Vera [2 ,3 ,4 ]
Konen, Eli [2 ,3 ]
Glicksberg, Benjamin S. [5 ]
Nadkarni, Girish [5 ,6 ]
Klang, Eyal [5 ,6 ]
机构
[1] Bar Ilan Univ, Azrieli Fac Med, Ha Hadas St 1, IL-7550598 Rishon Leziyyon, Israel
[2] Chaim Sheba Med Ctr, Dept Diag Imaging, Ramat Gan, Israel
[3] Tel Aviv Univ, Sch Med, Tel Aviv, Israel
[4] Chaim Sheba Med Ctr, DeepVis Lab, Ramat Gan, Israel
[5] Icahn Sch Med Mt Sinai, Div Data Driven & Digital Med D3M, New York, NY USA
[6] Icahn Sch Med Mt Sinai, Charles Bronfman Inst Personalized Med, New York, NY USA
关键词
Large language models; Generative pre-trained transformer; Multiple choice questions; Medical education; Artificial intelligence; Medical examination; MULTIPLE-CHOICE QUESTIONS; ARTIFICIAL-INTELLIGENCE;
D O I
10.1186/s12909-024-05239-y
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Background Writing multiple choice questions (MCQs) for the purpose of medical exams is challenging. It requires extensive medical knowledge, time and effort from medical educators. This systematic review focuses on the application of large language models (LLMs) in generating medical MCQs.Methods The authors searched for studies published up to November 2023. Search terms focused on LLMs generated MCQs for medical examinations. Non-English, out of year range and studies not focusing on AI generated multiple-choice questions were excluded. MEDLINE was used as a search database. Risk of bias was evaluated using a tailored QUADAS-2 tool.Results Overall, eight studies published between April 2023 and October 2023 were included. Six studies used Chat-GPT 3.5, while two employed GPT 4. Five studies showed that LLMs can produce competent questions valid for medical exams. Three studies used LLMs to write medical questions but did not evaluate the validity of the questions. One study conducted a comparative analysis of different models. One other study compared LLM-generated questions with those written by humans. All studies presented faulty questions that were deemed inappropriate for medical exams. Some questions required additional modifications in order to qualify.Conclusions LLMs can be used to write MCQs for medical examinations. However, their limitations cannot be ignored. Further study in this field is essential and more conclusive evidence is needed. Until then, LLMs may serve as a supplementary tool for writing medical examinations. 2 studies were at high risk of bias. The study followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] A systematic review of large language models and their implications in medical education
    Lucas, Harrison C.
    Upperman, Jeffrey S.
    Robinson, Jamie R.
    MEDICAL EDUCATION, 2024, 58 (11) : 1276 - 1285
  • [2] Large Language Models in Dental Licensing Examinations: Systematic Review and Meta-Analysis
    Liu, Mingxin
    Okuhara, Tsuyoshi
    Huang, Wenbo
    Ogihara, Atsushi
    Nagao, Hikari Sophia
    Okada, Hiroko
    Kiuchi, Takahiro
    INTERNATIONAL DENTAL JOURNAL, 2025, 75 (01) : 213 - 222
  • [3] Large language models in healthcare: from a systematic review on medical examinations to a comparative analysis on fundamentals of robotic surgery online test
    Moglia, Andrea
    Georgiou, Konstantinos
    Cerveri, Pietro
    Mainardi, Luca
    Satava, Richard M.
    Cuschieri, Alfred
    ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (09)
  • [4] Systematic review: The use of large language models as medical chatbots in digestive diseases
    Giuffre, Mauro
    Kresevic, Simone
    You, Kisung
    Dupont, Johannes
    Huebner, Jack
    Grimshaw, Alyssa Ann
    Shung, Dennis Legen
    ALIMENTARY PHARMACOLOGY & THERAPEUTICS, 2024, 60 (02) : 144 - 166
  • [5] Evaluating and addressing demographic disparities in medical large language models: a systematic review
    Omar, Mahmud
    Sorin, Vera
    Agbareia, Reem
    Apakama, Donald U.
    Soroush, Ali
    Sakhuja, Ankit
    Freeman, Robert
    Horowitz, Carol R.
    Richardson, Lynne D.
    Nadkarni, Girish N.
    Klang, Eyal
    INTERNATIONAL JOURNAL FOR EQUITY IN HEALTH, 2025, 24 (01)
  • [6] Large Language Models and Empathy: Systematic Review
    Sorin, Vera
    Brin, Dana
    Barash, Yiftach
    Konen, Eli
    Charney, Alexander
    Nadkarni, Girish
    Klang, Eyal
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
  • [7] Large Language Models in Gastroenterology: Systematic Review
    Gong, Eun Jeong
    Bang, Chang Seok
    Lee, Jae Jun
    Park, Jonghyung
    Kim, Eunsil
    Kim, Subeen
    Kimm, Minjae
    Choi, Seoung-Ho
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
  • [8] Large Language Models in Healthcare and Medical Domain: A Review
    Nazi, Zabir Al
    Peng, Wei
    INFORMATICS-BASEL, 2024, 11 (03):
  • [9] Applications of large language models in psychiatry: a systematic review
    Omar, Mahmud
    Soffer, Shelly
    Charney, Alexander W.
    Landi, Isotta
    Nadkarni, Girish N.
    Klang, Eyal
    FRONTIERS IN PSYCHIATRY, 2024, 15
  • [10] The ethical security of large language models: A systematic review
    Liu, Feng
    Jiang, Jiaqi
    Lu, Yating
    Huang, Zhanyi
    Jiang, Jiuming
    FRONTIERS OF ENGINEERING MANAGEMENT, 2025, 12 (01) : 128 - 140