Evaluating the value of AI-generated questions for USMLE step 1 preparation: A study using ChatGPT-3.5

被引:0
作者
Balu, Alan [1 ]
Prvulovic, Stefan T. [1 ]
Perez, Claudia Fernandez [1 ]
Kim, Alexander [1 ]
Donoho, Daniel A. [3 ]
Keating, Gregory [2 ]
机构
[1] Georgetown Univ, Dept Neurosurg, Sch Med, Washington, DC USA
[2] Medstar Georgetown Univ Hosp, Dept Neurosurg, Washington, DC USA
[3] Childrens Natl Hosp, Dept Neurosurg, Washington, DC USA
关键词
ChatGPT; USMLE; Step; 1; LLM; ARTIFICIAL-INTELLIGENCE; FUTURE;
D O I
10.1080/0142159X.2025.2478872
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Purpose: Students are increasingly relying on artificial intelligence (AI) for medical education and exam preparation. However, the factual accuracy and content distribution of AI-generated exam questions for self-assessment have not been systematically investigated. Methods: Curated prompts were created to generate multiple-choice questions matching the USMLE Step 1 examination style. We utilized ChatGPT-3.5 to generate 50 questions and answers based upon each prompt style. We manually examined output for factual accuracy, Bloom's Taxonomy, and category within the USMLE Step 1 content outline. Results: ChatGPT-3.5 generated 150 multiple-choice case-style questions and selected an answer. Overall, 83% of generated multiple questions had no factual inaccuracies and 15% contained one to two factual inaccuracies. With simple prompting, common themes included deep venous thrombosis, myocardial infarction, and thyroid disease. Topic diversity improved by separating content topic generation from question generation, and specificity to Step 1 increased by indicating that "treatment" questions were not desired. Conclusion: We demonstrate that ChatGPT-3.5 can successfully generate Step 1 style questions with reasonable factual accuracy, and this method may be used by medical students preparing for USMLE examinations. While AI-generated questions demonstrated adequate factual accuracy, targeted prompting techniques should be used to overcome ChatGPT's bias towards particular medical conditions.
引用
收藏
页数:9
相关论文
共 35 条
[1]  
Adamopoulou E., 2020, IFIP INT C ARTIFICIA, P373, DOI [DOI 10.1007/978-3-030-49186-4_31, 10.1007/978-3-030-49186-431, 10.1007978-3-030-49186-431]
[2]  
Agarwal M., 2023, Analysing the applicability of ChatGPT, Bard, and Bing to generate reasoning-based multiple-choice questions in medical physiology
[3]   Large language models (LLM) and ChatGPT: what will the impact on nuclear medicine be? [J].
Alberts, Ian L. ;
Mercolli, Lorenzo ;
Pyka, Thomas ;
Prenosil, George ;
Shi, Kuangyu ;
Rominger, Axel ;
Afshar-Oromieh, Ali .
EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2023, 50 (06) :1549-1552
[4]  
[Anonymous], 2023, ChatGPT
[5]  
[Anonymous], 2024, Step 1 Sample Test Questions
[6]  
[Anonymous], 2023, The top 50 high-yield USMLE Step 1 topics
[7]  
[Anonymous], 2023, Step 1 Content Outline and Specifications
[8]   Passing is Great: Can ChatGPT Conduct USMLE Exams? [J].
Biswas, Som .
ANNALS OF BIOMEDICAL ENGINEERING, 2023, 51 (09) :1885-1886
[9]   ChatGPT versus human in generating medical graduate exam multiple choice questions-A multinational prospective study (Hong Kong SAR, Singapore, Ireland, and the United Kingdom) [J].
Cheung, Billy Ho Hung ;
Lau, Gary Kui Kai ;
Wong, Gordon Tin Chun ;
Lee, Elaine Yuen Phin ;
Kulkarni, Dhananjay ;
Seow, Choon Sheong ;
Wong, Ruby ;
Co, Michael Tiong-Hong .
PLOS ONE, 2023, 18 (08)
[10]  
Deng J., 2022, Frontiers in Computing and Intelligent Systems, V2, P81, DOI DOI 10.54097/FCIS.V2I2.4465