In-depth analysis of ChatGPT's performance based on specific signaling words and phrases in the question stem of 2377 USMLE step 1 style questions

被引:3
作者
Knoedler, Leonard [1 ,2 ]
Knoedler, Samuel [3 ,4 ]
Hoch, Cosima C. [5 ]
Prantl, Lukas [6 ]
Frank, Konstantin [7 ]
Soiderer, Laura [8 ]
Cotofana, Sebastian [9 ,10 ,11 ]
Dorafshar, Amir H. [12 ]
Schenck, Thilo [13 ]
Vollbach, Felix [14 ]
Sofo, Giuseppe [15 ]
Alfertshofer, Michael [3 ,16 ]
机构
[1] Humboldt Univ, Charite Univ Med Berlin, Freie Univ Berlin, Dept Oral & Maxillofacial Surg, Berlin, Germany
[2] Berlin Inst Hlth, Berlin, Germany
[3] Tech Univ Munich, Klinikum Rechts Isar, Dept Plast Surg & Hand Surg, D-81675 Munich, Germany
[4] Harvard Med Sch, Brigham & Womens Hosp, Div Plast Surg, Dept Surg, Boston, MA USA
[5] Tech Univ Munich TUM, Sch Med, Dept Otolaryngol Head & Neck Surg, Munich, Germany
[6] Univ Hosp Regensburg, Dept Plast Hand & Reconstruct Surg, Regensburg, Germany
[7] Ocean Clin, Marbella, Spain
[8] Univ Hosp Regensburg, Regensburg, Germany
[9] Erasmus MC, Dept Dermatol, Rotterdam, Netherlands
[10] Queen Mary Univ London, Blizard Inst, Ctr Cutaneous Res, London, England
[11] Guangdong Second Prov Gen Hosp, Dept Plast & Reconstruct Surg, Guangzhou, Guangdong, Peoples R China
[12] Emory Univ, Sch Med, Dept Surg, Atlanta, GA USA
[13] Private Practice Diabetol, Munich, Germany
[14] Ludwig Maximilians Univ Munchen, Dept Hand Plast & Aesthet Surg, Munich, Germany
[15] Pontificia Univ Catolica Rio de Janeiro, Hosp Santa Casa Misericordia, Inst Ivo Pitanguy, Rio De Janeiro, Brazil
[16] Ludwig Maximilians Univ Munchen, Dept Oromaxillofacial Surg, Munich, Germany
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
关键词
ChatGPT; USMLE; USMLE Step 1; OpenAI; Medical Education; Clinical Decision-Making;
D O I
10.1038/s41598-024-63997-7
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
ChatGPT has garnered attention as a multifaceted AI chatbot with potential applications in medicine. Despite intriguing preliminary findings in areas such as clinical management and patient education, there remains a substantial knowledge gap in comprehensively understanding the chances and limitations of ChatGPT's capabilities, especially in medical test-taking and education. A total of n = 2,729 USMLE Step 1 practice questions were extracted from the Amboss question bank. After excluding 352 image-based questions, a total of 2,377 text-based questions were further categorized and entered manually into ChatGPT, and its responses were recorded. ChatGPT's overall performance was analyzed based on question difficulty, category, and content with regards to specific signal words and phrases. ChatGPT achieved an overall accuracy rate of 55.8% in a total number of n = 2,377 USMLE Step 1 preparation questions obtained from the Amboss online question bank. It demonstrated a significant inverse correlation between question difficulty and performance with r(s )= -0.306; p < 0.001, maintaining comparable accuracy to the human user peer group across different levels of question difficulty. Notably, ChatGPT outperformed in serology-related questions (61.1% vs. 53.8%; p = 0.005) but struggled with ECG-related content (42.9% vs. 55.6%; p = 0.021). ChatGPT achieved statistically significant worse performances in pathophysiology-related question stems. (Signal phrase = "what is the most likely/probable cause"). ChatGPT performed consistent across various question categories and difficulty levels. These findings emphasize the need for further investigations to explore the potential and limitations of ChatGPT in medical examination and education.
引用
收藏
页数:9
相关论文
共 24 条
  • [21] ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns
    Sallam, Malik
    [J]. HEALTHCARE, 2023, 11 (06)
  • [22] Sharma P, 2023, Arxiv, DOI [arXiv:2307.00112, 10.48550/arXiv.2307.00112, DOI 10.48550/ARXIV.2307.00112]
  • [23] Thosani P, 2020, PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS 2020), P224, DOI [10.1109/ICICCS48265.2020.9120912, 10.1109/iciccs48265.2020.9120912]
  • [24] Yaneva V, 2024, Academic Medicine