Performance of large language models in oral and maxillofacial surgery examinations

被引：2

作者：

Quah, B. ^{[1
,2
]}

Yong, C. W. ^{[1
,2
]}

Lai, C. W. M. ^{[1
]}

Islam, I. ^{[1
,2
]}

机构：

[1] Natl Univ Singapore, Fac Dent, 9 Lower Kent Ridge Rd, Singapore 119085, Singapore

[2] Natl Univ Ctr Oral Hlth, Discipline Oral & Maxillofacial Surg, Singapore, Singapore

来源：

INTERNATIONAL JOURNAL OF ORAL AND MAXILLOFACIAL SURGERY | 2024年 / 53卷 / 10期

关键词：

Artificial intelligence; Oral surgery; Dental education; Academic performance; Dentistry;

D O I：

10.1016/j.ijom.2024.06.003

中图分类号：

R78 [口腔科学];

学科分类号：

1003 ;

摘要：

This study aimed to determine the accuracy of large language models (LLMs) in answering oral and maxillofacial surgery (OMS) multiple choice questions. A total of 259 questions from the university's question bank were answered by the LLMs (GPT-3.5, GPT-4, Llama 2, Gemini, and Copilot). The scores per category as well as the total score out of 259 were recorded and evaluated, with the passing score set at 50%. The mean overall score amongst all LLMs was 62.5%. GPT-4 performed the best (76.8%, 95% confidence interval (CI) 71.4-82.2%), followed by Copilot (72.6%, 95% CI 67.2-78.0%), GPT-3.5 (62.2%, 95% CI 56.4-68.0%), Gemini (58.7%, 95% CI 52.9-64.5%), and Llama 2 (42.5%, 95% CI 37.1-48.6%). There was a statistically significant difference between the scores of the five LLMs overall (chi(2) = 79.9, df = 4, P < 0.001) and within all categories except 'basic sciences' (P = 0.129), 'dentoalveolar and implant surgery' (P = 0.052), and 'oral medicine/pathology/radiology' (P = 0.801). The LLMs performed best in 'basic sciences' (68.9%) and poorest in 'pharmacology' (45.9%). The LLMs can be used as adjuncts in teaching, but should not be used for clinical decision-making until the models are further developed and validated.

引用

页码：881 / 886

页数：6

共 50 条

[21] Anesthesia Equipment for the Oral and Maxillofacial Surgery Practice
Chung, William L.
ORAL AND MAXILLOFACIAL SURGERY CLINICS OF NORTH AMERICA, 2013, 25 (03) : 373 - +
[22] Current Nomina Anatomica for oral and maxillofacial surgery
Trost, O.
Hardy, H.
Perona, J. -M.
Trouilloud, P.
REVUE DE STOMATOLOGIE DE CHIRURGIE MAXILLO-FACIALE ET DE CHIRURGIE ORALE, 2014, 115 (05) : 287 - 292
[23] Psychological issues in oral and maxillofacial reconstructive surgery
De Sousa, Avinash
BRITISH JOURNAL OF ORAL & MAXILLOFACIAL SURGERY, 2008, 46 (08) : 661 - 664
[24] Perioperative antibiotic prophylaxis in oral and maxillofacial surgery
Karbach, J.
Al-Nawas, B.
MKG-CHIRURG, 2014, 7 (04): : 261 - 267
[25] Fresh frozen bone in oral and maxillofacial surgery
Rodella, Luigi Fabrizio
Cocchi, Marco Angelo
Rezzani, Rita
Procacci, Pasquale
Hirtler, Lena
Nocini, Pierfrancesco
Albanese, Massimo
JOURNAL OF DENTAL SCIENCES, 2015, 10 (02) : 115 - 122
[26] Oral and cranio-maxillofacial surgery in Byzantium
Mylonas, Anastassios I.
Poulakou-Rebelakou, Eleftheria-Fotini
Androutsos, Georgios I.
Seggas, Ioannis
Skouteris, Christos A.
Papadopoulou, Evangelia Chr
JOURNAL OF CRANIO-MAXILLOFACIAL SURGERY, 2014, 42 (02) : 159 - 168
[27] Large Language Models Take on Cardiothoracic Surgery: A Comparative Analysis of the Performance of Four Models on American Board of Thoracic Surgery Exam Questions in 2023
Khalpey, Zain
Kumar, Ujjawal
King, Nicholas
Abraham, Alyssa
Khalpey, Amina H.
CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (07)
[28] Authors' Reply to: Variability in Large Language Models' Responses to Medical Licensing and Certification Examinations
Gilson, Aidan
Safranek, Conrad W.
Huang, Thomas
Socrates, Vimig
Chi, Ling
Taylor, Richard Andrew
Chartash, David
JMIR MEDICAL EDUCATION, 2023, 9
[29] Large language models and artificial intelligence chatbots in vascular surgery
Lareyre, Fabien
Nasr, Bahaa
Poggi, Elise
Di Lorenzo, Gilles
Ballaith, Ali
Sliti, Imen
Chaudhuri, Arindam
Raffort, Juliette
SEMINARS IN VASCULAR SURGERY, 2024, 7 (03) : 314 - 320
[30] How well do large language model-based chatbots perform in oral and maxillofacial radiology?
Jeong, Hui
Han, Sang-Sun
Yu, Youngjae
Kim, Saejin
Jeon, Kug Jin
DENTOMAXILLOFACIAL RADIOLOGY, 2024, 53 (06) : 390 - 395

← 1 2 3 4 5 →