Performance of large language models in oral and maxillofacial surgery examinations

被引：2

作者：

Quah, B. ^{[1
,2
]}

Yong, C. W. ^{[1
,2
]}

Lai, C. W. M. ^{[1
]}

Islam, I. ^{[1
,2
]}

机构：

[1] Natl Univ Singapore, Fac Dent, 9 Lower Kent Ridge Rd, Singapore 119085, Singapore

[2] Natl Univ Ctr Oral Hlth, Discipline Oral & Maxillofacial Surg, Singapore, Singapore

来源：

INTERNATIONAL JOURNAL OF ORAL AND MAXILLOFACIAL SURGERY | 2024年 / 53卷 / 10期

关键词：

Artificial intelligence; Oral surgery; Dental education; Academic performance; Dentistry;

D O I：

10.1016/j.ijom.2024.06.003

中图分类号：

R78 [口腔科学];

学科分类号：

1003 ;

摘要：

This study aimed to determine the accuracy of large language models (LLMs) in answering oral and maxillofacial surgery (OMS) multiple choice questions. A total of 259 questions from the university's question bank were answered by the LLMs (GPT-3.5, GPT-4, Llama 2, Gemini, and Copilot). The scores per category as well as the total score out of 259 were recorded and evaluated, with the passing score set at 50%. The mean overall score amongst all LLMs was 62.5%. GPT-4 performed the best (76.8%, 95% confidence interval (CI) 71.4-82.2%), followed by Copilot (72.6%, 95% CI 67.2-78.0%), GPT-3.5 (62.2%, 95% CI 56.4-68.0%), Gemini (58.7%, 95% CI 52.9-64.5%), and Llama 2 (42.5%, 95% CI 37.1-48.6%). There was a statistically significant difference between the scores of the five LLMs overall (chi(2) = 79.9, df = 4, P < 0.001) and within all categories except 'basic sciences' (P = 0.129), 'dentoalveolar and implant surgery' (P = 0.052), and 'oral medicine/pathology/radiology' (P = 0.801). The LLMs performed best in 'basic sciences' (68.9%) and poorest in 'pharmacology' (45.9%). The LLMs can be used as adjuncts in teaching, but should not be used for clinical decision-making until the models are further developed and validated.

引用

页码：881 / 886

页数：6

共 50 条

[41] The global reach of social media in oral and maxillofacial surgery
Jack A. Harris
Nicole A. Beck
Cassi J. Niedziela
Gerardo A. Alvarez
Sheridan A. Danquah
Salim Afshar
Oral and Maxillofacial Surgery, 2023, 27 : 513 - 517
[42] Advantages and disadvantages of the use of bisphosphonates in oral and maxillofacial surgery
de Souza Loureiro, Caio Cesar
Lobo Leandro, Luiz Fernando
INTERNATIONAL DENTAL JOURNAL, 2010, 60 (04) : 263 - 268
[43] KNOWLEDGE GAPS IN ORAL AND MAXILLOFACIAL SURGERY: A SYSTEMATIC MAPPING
Osterberg, Marie
Holmlund, Anders
Sunzel, Bo
Tranaeus, Sofia
Twetman, Svante
Lund, Bodil
INTERNATIONAL JOURNAL OF TECHNOLOGY ASSESSMENT IN HEALTH CARE, 2017, 33 (01) : 93 - 102
[44] Performance of GPT-4 in oral and maxillofacial surgery board exams: challenges in specialized questions
Felix Benjamin Warwas
Nils Heim
Oral and Maxillofacial Surgery, 29 (1)
[45] ChatGPT and large language models in orthopedics: from education and surgery to research
Chatterjee, Srijan
Bhattacharya, Manojit
Pal, Soumen
Lee, Sang-Soo
Chakraborty, Chiranjib
JOURNAL OF EXPERIMENTAL ORTHOPAEDICS, 2023, 10 (01)
[46] Investigating the role of large language models on questions about refractive surgery
Demir, Suleyman
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2025, 195
[47] Oral and Maxillofacial Surgery and Oral Surgery - what's the difference? A Western Australian dental student survey
Cooper, T.
Schenberg, K.
Smith, L.
Bobinskas, A.
BRITISH JOURNAL OF ORAL & MAXILLOFACIAL SURGERY, 2020, 58 (10) : 1276 - 1281
[48] Oral and Maxillofacial Surgery Curriculum (2021) and Oral Surgery Curriculum (2023): A forensic comparison of two documents
Capanni, P. M.
Magill, S.
Walker, T.
Varley, I.
Magennis, P.
BRITISH JOURNAL OF ORAL & MAXILLOFACIAL SURGERY, 2025, 63 (02) : 125 - 132
[49] Will code one day run a code? Performance of language models on ACEM primary examinations and implications
Smith, Jesse
Choi, Philip M. C.
Buntine, Paul
EMERGENCY MEDICINE AUSTRALASIA, 2023, 35 (05) : 876 - 878
[50] View from the Other Side: A Perspective on Oral and Maxillofacial Surgery in a Developing Nation - Bangladesh
Molla, Motiur Rahman
Haji, Hussein K.
Molla, Nafisa Marium
ORAL AND MAXILLOFACIAL SURGERY CLINICS OF NORTH AMERICA, 2020, 32 (03) : 377 - +

← 1 2 3 4 5 →