Comparative Assessment of Otolaryngology Knowledge Among Large Language Models

被引:0
|
作者
Merlino, Dante J. [1 ]
Brufau, Santiago R. [1 ]
Saieed, George [1 ]
Van Abel, Kathryn M. [1 ]
Price, Daniel L. [1 ]
Archibald, David J. [2 ]
Ator, Gregory A. [3 ]
Carlson, Matthew L. [1 ,4 ]
机构
[1] Mayo Clin, Dept Otolaryngol Head & Neck Surg, 200 1st St SW, Rochester, MN 55905 USA
[2] Ctr Plast Surg Castle Rock, Castle Rock, CO USA
[3] Univ Kansas, Med Ctr, Dept Otolaryngol Head & Neck Surg, Kansas City, KS USA
[4] Mayo Clin, Dept Neurol Surg, Rochester, MN USA
关键词
AI; artificial intelligence; education; ENT; large language models; otolaryngology;
D O I
10.1002/lary.31781
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
This study assessed the baseline knowledge of advanced large language models (GPT-3.5 and GPT-4 by OpenAI; PaLM2 and MedPaLM by Google; LLama3:70b by Meta) in topics within otolaryngology-head and neck surgery, using a dataset of 4566 multiple choice, board-style questions. The highest performing model, GPT-4, correctly answered 77% of the time, while the lowest-performing model, PaLM2, was correct on 56.5% of its responses; the free, open source model LLama3:70b correctly answered 66.8% of questions. Performance improved across models when asked to provide the reasoning behind their responses, with GPT-4 correctly changing its incorrect answers to correct 31% of the time.image
引用
收藏
页码:629 / 634
页数:6
相关论文
共 50 条
  • [1] The Comparative Diagnostic Capability of Large Language Models in Otolaryngology
    Warrier, Akshay
    Singh, Rohan
    Haleem, Afash
    Zaki, Haider
    Eloy, Jean Anderson
    LARYNGOSCOPE, 2024, 134 (09) : 3997 - 4002
  • [2] Assessment of otolaryngology knowledge among primary care providers in Saudi Arabia
    Alobaida, Badr A.
    Alayed, Faisal A.
    Alrudayni, Abdullah K.
    Alzabadin, Rakan A.
    Aldera, Sultan A.
    Alrajhi, Naif I.
    Alfrayan, Meshal I.
    Almassari, Abdulrahman K.
    Alrakaf, Feras A.
    Alotaibi, Fahad Z.
    MEDICAL SCIENCE, 2022, 26 (119)
  • [3] A Comparative Analysis of Three Large Language Models on Bruxism Knowledge
    Camargo, Elisa Souza
    Quadras, Isabella Christina Costa
    Garanhani, Roberto Ramos
    de Araujo, Cristiano Miranda
    Stuginski-Barbosa, Juliana
    JOURNAL OF ORAL REHABILITATION, 2025,
  • [4] Evaluating Intelligence and Knowledge in Large Language Models
    Bianchini, Francesco
    TOPOI-AN INTERNATIONAL REVIEW OF PHILOSOPHY, 2025, 44 (01): : 163 - 173
  • [5] A comparative analysis of knowledge injection strategies for large language models in the domain
    Cadeddu, Andrea
    Chessa, Alessandro
    De Leo, Vincenzo
    Fenu, Gianni
    Motta, Enrico
    Osborne, Francesco
    Recupero, Diego Reforgiato
    Salatino, Angelo
    Secchi, Luca
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [6] Performance Assessment of Large Language Models in Medical Consultation: Comparative Study
    Seo, Sujeong
    Kim, Kyuli
    Yang, Heyoung
    JMIR MEDICAL INFORMATICS, 2025, 13
  • [7] Large Language Models in Otolaryngology Residency Admissions: A Random Sampling Analysis
    Halagur, Akash S.
    Balakrishnan, Karthik
    Ayoub, Noel
    LARYNGOSCOPE, 2025, 135 (01) : 87 - 93
  • [8] Variability in Large Language Models' Responses to Medical Licensing and Certification Examinations. Comment on "How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment"
    Epstein, Richard H.
    Dexter, Franklin
    JMIR MEDICAL EDUCATION, 2023, 9
  • [9] Comparative Assessment of Protein Large Language Models for Enzyme Commission Number Prediction
    Capela, Joao
    Zimmermann-Kogadeeva, Maria
    van Dijk, Aalt D. J.
    de Ridder, Dick
    Dias, Oscar
    Rocha, Miguel
    BMC BIOINFORMATICS, 2025, 26 (01):
  • [10] Large Language Models Versus Expert Clinicians in CrisisPrediction Among Telemental Health Patients:Comparative Study
    Lee, Christine
    Mohebbi, Matthew
    Callaghan, Erin O'
    Winsberg, Mirene
    JMIR MENTAL HEALTH, 2024, 11