Comparative Assessment of Otolaryngology Knowledge Among Large Language Models

被引：0

作者：

Merlino, Dante J. ^{[1
]}

Brufau, Santiago R. ^{[1
]}

Saieed, George ^{[1
]}

Van Abel, Kathryn M. ^{[1
]}

Price, Daniel L. ^{[1
]}

Archibald, David J. ^{[2
]}

Ator, Gregory A. ^{[3
]}

Carlson, Matthew L. ^{[1
,4
]}

机构：

[1] Mayo Clin, Dept Otolaryngol Head & Neck Surg, 200 1st St SW, Rochester, MN 55905 USA

[2] Ctr Plast Surg Castle Rock, Castle Rock, CO USA

[3] Univ Kansas, Med Ctr, Dept Otolaryngol Head & Neck Surg, Kansas City, KS USA

[4] Mayo Clin, Dept Neurol Surg, Rochester, MN USA

来源：

LARYNGOSCOPE | 2025年 / 135卷 / 02期

关键词：

AI; artificial intelligence; education; ENT; large language models; otolaryngology;

D O I：

10.1002/lary.31781

中图分类号：

R-3 [医学研究方法]; R3 [基础医学];

学科分类号：

1001 ;

摘要：

This study assessed the baseline knowledge of advanced large language models (GPT-3.5 and GPT-4 by OpenAI; PaLM2 and MedPaLM by Google; LLama3:70b by Meta) in topics within otolaryngology-head and neck surgery, using a dataset of 4566 multiple choice, board-style questions. The highest performing model, GPT-4, correctly answered 77% of the time, while the lowest-performing model, PaLM2, was correct on 56.5% of its responses; the free, open source model LLama3:70b correctly answered 66.8% of questions. Performance improved across models when asked to provide the reasoning behind their responses, with GPT-4 correctly changing its incorrect answers to correct 31% of the time.image

引用

页码：629 / 634

页数：6

共 50 条

[31] A Multicenter, Cross-Sectional Assessment of Otolaryngology Knowledge Among Primary Care Trainees
O'Brien, Daniel C.
Squires, Lane D.
Robinson, Aaron D.
Ramadan, Hassan
Diaz, Rodney
ANNALS OF OTOLOGY RHINOLOGY AND LARYNGOLOGY, 2018, 127 (09) : 631 - 636
[32] Knowledge retrieval and diagnostics in cloud services with large language models
Baghdasaryan, Ashot
Bunarjyan, Tigran
Poghosyan, Arnak
Harutyunyan, Ashot
El-Zein, Jad
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
[33] Accelerating knowledge graph and ontology engineering with large language models
Shimizu, Cogan
Hitzler, Pascal
JOURNAL OF WEB SEMANTICS, 2025, 85
[34] GenKP: generative knowledge prompts for enhancing large language models
Li, Xinbai
Peng, Shaowen
Yada, Shuntaro
Wakamiya, Shoko
Aramaki, Eiji
APPLIED INTELLIGENCE, 2025, 55 (06)
[35] Knowledge-Aware Code Generation with Large Language Models
Huang, Tao
Sun, Zhihong
Jin, Zhi
Li, Ge
Lyu, Chen
PROCEEDINGS 2024 32ND IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC 2024, 2024, : 52 - 63
[36] Exploring the Answering Capability of Large Language Models in Addressing Complex Knowledge in Entrepreneurship Education
Lang, Qi
Tian, Shengjing
Wang, Mo
Wang, Jianan
IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES, 2024, 17 : 2107 - 2116
[37] Connecting AI: Merging Large Language Models and Knowledge Graph
Jovanovic, Mladan
Campbell, Mark
COMPUTER, 2023, 56 (11) : 103 - 108
[38] Assessment of Large Language Models in Cataract Care Information Provision: A Quantitative Comparison
Su, Zichang
Jin, Kai
Wu, Hongkang
Luo, Ziyao
Grzybowski, Andrzej
Ye, Juan
OPHTHALMOLOGY AND THERAPY, 2025, 14 (01) : 103 - 116
[39] Applying Large Language Models to Enhance the Assessment of Parallel Functional Programming Assignments
Grandel, Skyler
Schmidt, Douglas C.
Leach, Kevin
2024 INTERNATIONAL WORKSHOP ON LARGE LANGUAGE MODELS FOR CODE, LLM4CODE 2024, 2024, : 102 - 110
[40] Enhancing Large Language Models Through External Domain Knowledge
Welz, Laslo
Lanquillon, Carsten
ARTIFICIAL INTELLIGENCE IN HCI, PT III, AI-HCI 2024, 2024, 14736 : 135 - 146

← 1 2 3 4 5 →