Comparative Assessment of Otolaryngology Knowledge Among Large Language Models

被引:0
作者
Merlino, Dante J. [1 ]
Brufau, Santiago R. [1 ]
Saieed, George [1 ]
Van Abel, Kathryn M. [1 ]
Price, Daniel L. [1 ]
Archibald, David J. [2 ]
Ator, Gregory A. [3 ]
Carlson, Matthew L. [1 ,4 ]
机构
[1] Mayo Clin, Dept Otolaryngol Head & Neck Surg, 200 1st St SW, Rochester, MN 55905 USA
[2] Ctr Plast Surg Castle Rock, Castle Rock, CO USA
[3] Univ Kansas, Med Ctr, Dept Otolaryngol Head & Neck Surg, Kansas City, KS USA
[4] Mayo Clin, Dept Neurol Surg, Rochester, MN USA
关键词
AI; artificial intelligence; education; ENT; large language models; otolaryngology;
D O I
10.1002/lary.31781
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
This study assessed the baseline knowledge of advanced large language models (GPT-3.5 and GPT-4 by OpenAI; PaLM2 and MedPaLM by Google; LLama3:70b by Meta) in topics within otolaryngology-head and neck surgery, using a dataset of 4566 multiple choice, board-style questions. The highest performing model, GPT-4, correctly answered 77% of the time, while the lowest-performing model, PaLM2, was correct on 56.5% of its responses; the free, open source model LLama3:70b correctly answered 66.8% of questions. Performance improved across models when asked to provide the reasoning behind their responses, with GPT-4 correctly changing its incorrect answers to correct 31% of the time.image
引用
收藏
页码:629 / 634
页数:6
相关论文
共 50 条
  • [41] Knowledge-embedded large language models for emergency triage
    Shen, Qingyang
    Zhang, Xiaozhi
    Ren, Haomin
    Guo, Quan
    Yi, Zhang
    KNOWLEDGE-BASED SYSTEMS, 2025, 318
  • [42] The Opportunities and Risks of Large Language Models in Mental Health
    Lawrence, Hannah R.
    Schneider, Renee A.
    Rubin, Susan B.
    Mataric, Maja J.
    McDuff, Daniel J.
    Bell, Megan Jones
    JMIR MENTAL HEALTH, 2024, 11
  • [43] Comparative analysis of large language models on rare disease identification
    Ao, Guangyu
    Chen, Min
    Li, Jing
    Nie, Huibing
    Zhang, Lei
    Chen, Zejun
    ORPHANET JOURNAL OF RARE DISEASES, 2025, 20 (01)
  • [44] Large Language Models for Therapy Recommendations Across 3 Clinical Specialties: Comparative Study
    Wilhelm, Theresa Isabelle
    Roos, Jonas
    Kaczmarczyk, Robert
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2023, 25
  • [45] Evolutionary Large Language Models for Hardware Security: A Comparative Survey
    Akyash, Mohammad
    Kamali, Hadi M.
    PROCEEDING OF THE GREAT LAKES SYMPOSIUM ON VLSI 2024, GLSVLSI 2024, 2024, : 496 - 501
  • [46] Comparative Analysis of Large Language Models in Source Code Analysis
    Erdogan, Huseyin
    Turan, Nezihe Turhan
    Onan, Aytug
    INTELLIGENT AND FUZZY SYSTEMS, INFUS 2024 CONFERENCE, VOL 1, 2024, 1088 : 185 - 192
  • [47] A Comparative Analysis of Large Language Models for Code Documentation Generation
    Dvivedi, Shubhang Shekhar
    Vijay, Vyshnav
    Pujari, Sai Leela Rahul
    Lodh, Shoumik
    Kumar, Dhruv
    PROCEEDINGS OF THE 1ST ACM INTERNATIONAL CONFERENCE ON AI-POWERED SOFTWARE, AIWARE 2024, 2024, : 65 - 73
  • [48] Authorship transparency and equity in otolaryngology and maxillofacial surgery: current practices and the potential impact of large language models
    Frosolini, Andrea
    Benedetti, Simone
    Gennaro, Paolo
    Gabriele, Guido
    EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY, 2025, 282 (03) : 1641 - 1647
  • [49] Large language models for overcoming language barriers in obstetric anaesthesia: a structured assessment
    Lomas, A.
    Broom, M. A.
    INTERNATIONAL JOURNAL OF OBSTETRIC ANESTHESIA, 2024, 60
  • [50] Evolution and Prospects of Foundation Models: From Large Language Models to Large Multimodal Models
    Chen, Zheyi
    Xu, Liuchang
    Zheng, Hongting
    Chen, Luyao
    Tolba, Amr
    Zhao, Liang
    Yu, Keping
    Feng, Hailin
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 80 (02): : 1753 - 1808