Performance of large language models in oral and maxillofacial surgery examinations

被引：2

作者：

Quah, B. ^{[1
,2
]}

Yong, C. W. ^{[1
,2
]}

Lai, C. W. M. ^{[1
]}

Islam, I. ^{[1
,2
]}

机构：

[1] Natl Univ Singapore, Fac Dent, 9 Lower Kent Ridge Rd, Singapore 119085, Singapore

[2] Natl Univ Ctr Oral Hlth, Discipline Oral & Maxillofacial Surg, Singapore, Singapore

来源：

INTERNATIONAL JOURNAL OF ORAL AND MAXILLOFACIAL SURGERY | 2024年 / 53卷 / 10期

关键词：

Artificial intelligence; Oral surgery; Dental education; Academic performance; Dentistry;

D O I：

10.1016/j.ijom.2024.06.003

中图分类号：

R78 [口腔科学];

学科分类号：

1003 ;

摘要：

This study aimed to determine the accuracy of large language models (LLMs) in answering oral and maxillofacial surgery (OMS) multiple choice questions. A total of 259 questions from the university's question bank were answered by the LLMs (GPT-3.5, GPT-4, Llama 2, Gemini, and Copilot). The scores per category as well as the total score out of 259 were recorded and evaluated, with the passing score set at 50%. The mean overall score amongst all LLMs was 62.5%. GPT-4 performed the best (76.8%, 95% confidence interval (CI) 71.4-82.2%), followed by Copilot (72.6%, 95% CI 67.2-78.0%), GPT-3.5 (62.2%, 95% CI 56.4-68.0%), Gemini (58.7%, 95% CI 52.9-64.5%), and Llama 2 (42.5%, 95% CI 37.1-48.6%). There was a statistically significant difference between the scores of the five LLMs overall (chi(2) = 79.9, df = 4, P < 0.001) and within all categories except 'basic sciences' (P = 0.129), 'dentoalveolar and implant surgery' (P = 0.052), and 'oral medicine/pathology/radiology' (P = 0.801). The LLMs performed best in 'basic sciences' (68.9%) and poorest in 'pharmacology' (45.9%). The LLMs can be used as adjuncts in teaching, but should not be used for clinical decision-making until the models are further developed and validated.

引用

页码：881 / 886

页数：6

共 50 条

[1] The impact and opportunities of large language models like ChatGPT in oral and maxillofacial surgery: a narrative review
Puladi, B.
Gsaxner, C.
Kleesiek, J.
Hoelzle, F.
Roehrig, R.
Egger, J.
INTERNATIONAL JOURNAL OF ORAL AND MAXILLOFACIAL SURGERY, 2024, 53 (01) : 78 - 88
[2] ScholarGPT's performance in oral and maxillofacial surgery
Balel, Yunus
JOURNAL OF STOMATOLOGY ORAL AND MAXILLOFACIAL SURGERY, 2025, 126 (04)
[3] Large language models for generating medical examinations: systematic review
Artsi, Yaara
Sorin, Vera
Konen, Eli
Glicksberg, Benjamin S.
Nadkarni, Girish
Klang, Eyal
BMC MEDICAL EDUCATION, 2024, 24 (01)
[4] Short-Answer Examinations Improve Student Performance in an Oral and Maxillofacial Pathology Course
Pinckard, R. Neal
McMahan, C. Alex
Prihoda, Thomas J.
Littlefield, John H.
Jones, Anne Cale
JOURNAL OF DENTAL EDUCATION, 2009, 73 (08) : 950 - 961
[5] Oral maxillofacial surgery resident, faculty and practitioner role models and dental students' interest in oral maxillofacial surgery careers: Does gender matter?
Marti, Kyriaki C.
Edwards, Sean P.
Inglehart, Marita R.
JOURNAL OF DENTAL EDUCATION, 2023, 87 (07) : 1022 - 1032
[6] Large Language Models in Dental Licensing Examinations: Systematic Review and Meta-Analysis
Liu, Mingxin
Okuhara, Tsuyoshi
Huang, Wenbo
Ogihara, Atsushi
Nagao, Hikari Sophia
Okada, Hiroko
Kiuchi, Takahiro
INTERNATIONAL DENTAL JOURNAL, 2025, 75 (01) : 213 - 222
[7] Performance of Multimodal Large Language Models in Japanese Diagnostic Radiology Board Examinations (2021-2023)
Nakaura, Takeshi
Yoshida, Naofumi
Kobayashi, Naoki
Nagayama, Yasunori
Uetani, Hiroyuki
Kidoh, Masafumi
Oda, Seitaro
Funama, Yoshinori
Hirai, Toshinori
ACADEMIC RADIOLOGY, 2025, 32 (05) : 2394 - 2401
[8] MILA in Teaching Oral & Maxillofacial Surgery
Nallaswamy, V. Deepak
Madhulaxmi
Senthilmurugan
Neralla, Mahathi
Abhinav, R. P.
INTERNATIONAL JOURNAL OF EARLY CHILDHOOD SPECIAL EDUCATION, 2022, 14 (02) : 2116 - 2137
[9] Pain Medicine in Oral and Maxillofacial Surgery
Pandey, Chandrashekhar
Devadiga, Trupti
Thorat, Ashutosh Jaysing
Punde, Prashant Ashok
JOURNAL OF PHARMACEUTICAL RESEARCH INTERNATIONAL, 2021, 33 (40B) : 221 - 235
[10] The Comparative Performance of Large Language Models on the Hand Surgery Self-Assessment Examination
Chen, Clark J.
Sobol, Keenan
Hickey, Connor
Raphael, James
HAND-AMERICAN ASSOCIATION FOR HAND SURGERY, 2024,

← 1 2 3 4 5 →