Comparative Assessment of Otolaryngology Knowledge Among Large Language Models

被引:0
|
作者
Merlino, Dante J. [1 ]
Brufau, Santiago R. [1 ]
Saieed, George [1 ]
Van Abel, Kathryn M. [1 ]
Price, Daniel L. [1 ]
Archibald, David J. [2 ]
Ator, Gregory A. [3 ]
Carlson, Matthew L. [1 ,4 ]
机构
[1] Mayo Clin, Dept Otolaryngol Head & Neck Surg, 200 1st St SW, Rochester, MN 55905 USA
[2] Ctr Plast Surg Castle Rock, Castle Rock, CO USA
[3] Univ Kansas, Med Ctr, Dept Otolaryngol Head & Neck Surg, Kansas City, KS USA
[4] Mayo Clin, Dept Neurol Surg, Rochester, MN USA
关键词
AI; artificial intelligence; education; ENT; large language models; otolaryngology;
D O I
10.1002/lary.31781
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
This study assessed the baseline knowledge of advanced large language models (GPT-3.5 and GPT-4 by OpenAI; PaLM2 and MedPaLM by Google; LLama3:70b by Meta) in topics within otolaryngology-head and neck surgery, using a dataset of 4566 multiple choice, board-style questions. The highest performing model, GPT-4, correctly answered 77% of the time, while the lowest-performing model, PaLM2, was correct on 56.5% of its responses; the free, open source model LLama3:70b correctly answered 66.8% of questions. Performance improved across models when asked to provide the reasoning behind their responses, with GPT-4 correctly changing its incorrect answers to correct 31% of the time.image
引用
收藏
页码:629 / 634
页数:6
相关论文
共 50 条
  • [31] A Multicenter, Cross-Sectional Assessment of Otolaryngology Knowledge Among Primary Care Trainees
    O'Brien, Daniel C.
    Squires, Lane D.
    Robinson, Aaron D.
    Ramadan, Hassan
    Diaz, Rodney
    ANNALS OF OTOLOGY RHINOLOGY AND LARYNGOLOGY, 2018, 127 (09) : 631 - 636
  • [32] Knowledge retrieval and diagnostics in cloud services with large language models
    Baghdasaryan, Ashot
    Bunarjyan, Tigran
    Poghosyan, Arnak
    Harutyunyan, Ashot
    El-Zein, Jad
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
  • [33] Accelerating knowledge graph and ontology engineering with large language models
    Shimizu, Cogan
    Hitzler, Pascal
    JOURNAL OF WEB SEMANTICS, 2025, 85
  • [34] GenKP: generative knowledge prompts for enhancing large language models
    Li, Xinbai
    Peng, Shaowen
    Yada, Shuntaro
    Wakamiya, Shoko
    Aramaki, Eiji
    APPLIED INTELLIGENCE, 2025, 55 (06)
  • [35] Knowledge-Aware Code Generation with Large Language Models
    Huang, Tao
    Sun, Zhihong
    Jin, Zhi
    Li, Ge
    Lyu, Chen
    PROCEEDINGS 2024 32ND IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC 2024, 2024, : 52 - 63
  • [36] Exploring the Answering Capability of Large Language Models in Addressing Complex Knowledge in Entrepreneurship Education
    Lang, Qi
    Tian, Shengjing
    Wang, Mo
    Wang, Jianan
    IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES, 2024, 17 : 2107 - 2116
  • [37] Connecting AI: Merging Large Language Models and Knowledge Graph
    Jovanovic, Mladan
    Campbell, Mark
    COMPUTER, 2023, 56 (11) : 103 - 108
  • [38] Assessment of Large Language Models in Cataract Care Information Provision: A Quantitative Comparison
    Su, Zichang
    Jin, Kai
    Wu, Hongkang
    Luo, Ziyao
    Grzybowski, Andrzej
    Ye, Juan
    OPHTHALMOLOGY AND THERAPY, 2025, 14 (01) : 103 - 116
  • [39] Applying Large Language Models to Enhance the Assessment of Parallel Functional Programming Assignments
    Grandel, Skyler
    Schmidt, Douglas C.
    Leach, Kevin
    2024 INTERNATIONAL WORKSHOP ON LARGE LANGUAGE MODELS FOR CODE, LLM4CODE 2024, 2024, : 102 - 110
  • [40] Enhancing Large Language Models Through External Domain Knowledge
    Welz, Laslo
    Lanquillon, Carsten
    ARTIFICIAL INTELLIGENCE IN HCI, PT III, AI-HCI 2024, 2024, 14736 : 135 - 146