Comparative Assessment of Otolaryngology Knowledge Among Large Language Models

被引:0
|
作者
Merlino, Dante J. [1 ]
Brufau, Santiago R. [1 ]
Saieed, George [1 ]
Van Abel, Kathryn M. [1 ]
Price, Daniel L. [1 ]
Archibald, David J. [2 ]
Ator, Gregory A. [3 ]
Carlson, Matthew L. [1 ,4 ]
机构
[1] Mayo Clin, Dept Otolaryngol Head & Neck Surg, 200 1st St SW, Rochester, MN 55905 USA
[2] Ctr Plast Surg Castle Rock, Castle Rock, CO USA
[3] Univ Kansas, Med Ctr, Dept Otolaryngol Head & Neck Surg, Kansas City, KS USA
[4] Mayo Clin, Dept Neurol Surg, Rochester, MN USA
关键词
AI; artificial intelligence; education; ENT; large language models; otolaryngology;
D O I
10.1002/lary.31781
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
This study assessed the baseline knowledge of advanced large language models (GPT-3.5 and GPT-4 by OpenAI; PaLM2 and MedPaLM by Google; LLama3:70b by Meta) in topics within otolaryngology-head and neck surgery, using a dataset of 4566 multiple choice, board-style questions. The highest performing model, GPT-4, correctly answered 77% of the time, while the lowest-performing model, PaLM2, was correct on 56.5% of its responses; the free, open source model LLama3:70b correctly answered 66.8% of questions. Performance improved across models when asked to provide the reasoning behind their responses, with GPT-4 correctly changing its incorrect answers to correct 31% of the time.image
引用
收藏
页码:629 / 634
页数:6
相关论文
共 50 条
  • [21] Performance Evaluation and Implications of Large Language Models in Radiology Board Exams: Prospective Comparative Analysis
    Wei, Boxiong
    JMIR MEDICAL EDUCATION, 2025, 11
  • [22] The Comparative Performance of Large Language Models on the Hand Surgery Self-Assessment Examination
    Chen, Clark J.
    Sobol, Keenan
    Hickey, Connor
    Raphael, James
    HAND-AMERICAN ASSOCIATION FOR HAND SURGERY, 2024,
  • [23] Performance of ChatGPT in Otolaryngology knowledge assessment
    Revercomb, Lucy
    Patel, Aman M.
    Choudhry, Hassaam S.
    Filimonov, Andrey
    AMERICAN JOURNAL OF OTOLARYNGOLOGY, 2024, 45 (01)
  • [24] A comparative analysis of large language models on clinical questions for autoimmune diseases
    Chen, Jing
    Ma, Juntao
    Yu, Jie
    Zhang, Weiming
    Zhu, Yijia
    Feng, Jiawei
    Geng, Linyu
    Dong, Xianchi
    Zhang, Huayong
    Chen, Yuxin
    Ning, Mingzhe
    FRONTIERS IN DIGITAL HEALTH, 2025, 7
  • [25] A Comparative Analysis of the Performance of Large Language Models and Human Respondents in Dermatology
    Murthy, Aravind Baskar
    Palaniappan, Vijayasankar
    Radhakrishnan, Suganya
    Rajaa, Sathish
    Karthikeyan, Kaliaperumal
    INDIAN DERMATOLOGY ONLINE JOURNAL, 2025, 16 (02) : 241 - 247
  • [26] Updating knowledge in Large Language Models: an Empirical Evaluation
    Marinelli, Alberto Roberto
    Carta, Antonio
    Passaro, Lucia C.
    IEEE CONFERENCE ON EVOLVING AND ADAPTIVE INTELLIGENT SYSTEMS 2024, IEEE EAIS 2024, 2024, : 289 - 296
  • [27] Comparative Analysis of Large Language Models and Spine Surgeons in Surgical Decision-Making and Radiological Assessment for Spine Pathologies
    Almekkawi, Ahmad K.
    Caruso, James P.
    Anand, Soummitra
    Hawkins, Angela M.
    Rauf, Rayaan
    Al-Shaikhli, Mayar
    Aoun, Salah G.
    Bagley, Carlos A.
    WORLD NEUROSURGERY, 2025, 194
  • [28] Performance of Novel GPT-4 in Otolaryngology Knowledge Assessment
    Revercomb, Lucy
    Patel, Aman M.
    Fu, Daniel
    Filimonov, Andrey
    INDIAN JOURNAL OF OTOLARYNGOLOGY AND HEAD & NECK SURGERY, 2024, 76 (06) : 6112 - 6114
  • [29] Unifying Large Language Models and Knowledge Graphs: A Roadmap
    Pan, Shirui
    Luo, Linhao
    Wang, Yufei
    Chen, Chen
    Wang, Jiapu
    Wu, Xindong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (07) : 3580 - 3599
  • [30] Skin and Syntax: Large Language Models in Dermatopathology
    Shah, Asghar
    Wahood, Samer
    Guermazi, Dorra
    Brem, Candice E.
    Saliba, Elie
    DERMATOPATHOLOGY, 2024, 11 (01): : 101 - 111