Performance of large language models in oral and maxillofacial surgery examinations

被引:2
作者
Quah, B. [1 ,2 ]
Yong, C. W. [1 ,2 ]
Lai, C. W. M. [1 ]
Islam, I. [1 ,2 ]
机构
[1] Natl Univ Singapore, Fac Dent, 9 Lower Kent Ridge Rd, Singapore 119085, Singapore
[2] Natl Univ Ctr Oral Hlth, Discipline Oral & Maxillofacial Surg, Singapore, Singapore
关键词
Artificial intelligence; Oral surgery; Dental education; Academic performance; Dentistry;
D O I
10.1016/j.ijom.2024.06.003
中图分类号
R78 [口腔科学];
学科分类号
1003 ;
摘要
This study aimed to determine the accuracy of large language models (LLMs) in answering oral and maxillofacial surgery (OMS) multiple choice questions. A total of 259 questions from the university's question bank were answered by the LLMs (GPT-3.5, GPT-4, Llama 2, Gemini, and Copilot). The scores per category as well as the total score out of 259 were recorded and evaluated, with the passing score set at 50%. The mean overall score amongst all LLMs was 62.5%. GPT-4 performed the best (76.8%, 95% confidence interval (CI) 71.4-82.2%), followed by Copilot (72.6%, 95% CI 67.2-78.0%), GPT-3.5 (62.2%, 95% CI 56.4-68.0%), Gemini (58.7%, 95% CI 52.9-64.5%), and Llama 2 (42.5%, 95% CI 37.1-48.6%). There was a statistically significant difference between the scores of the five LLMs overall (chi(2) = 79.9, df = 4, P < 0.001) and within all categories except 'basic sciences' (P = 0.129), 'dentoalveolar and implant surgery' (P = 0.052), and 'oral medicine/pathology/radiology' (P = 0.801). The LLMs performed best in 'basic sciences' (68.9%) and poorest in 'pharmacology' (45.9%). The LLMs can be used as adjuncts in teaching, but should not be used for clinical decision-making until the models are further developed and validated.
引用
收藏
页码:881 / 886
页数:6
相关论文
共 50 条
  • [1] The impact and opportunities of large language models like ChatGPT in oral and maxillofacial surgery: a narrative review
    Puladi, B.
    Gsaxner, C.
    Kleesiek, J.
    Hoelzle, F.
    Roehrig, R.
    Egger, J.
    INTERNATIONAL JOURNAL OF ORAL AND MAXILLOFACIAL SURGERY, 2024, 53 (01) : 78 - 88
  • [2] ScholarGPT's performance in oral and maxillofacial surgery
    Balel, Yunus
    JOURNAL OF STOMATOLOGY ORAL AND MAXILLOFACIAL SURGERY, 2025, 126 (04)
  • [3] Large language models for generating medical examinations: systematic review
    Artsi, Yaara
    Sorin, Vera
    Konen, Eli
    Glicksberg, Benjamin S.
    Nadkarni, Girish
    Klang, Eyal
    BMC MEDICAL EDUCATION, 2024, 24 (01)
  • [4] Short-Answer Examinations Improve Student Performance in an Oral and Maxillofacial Pathology Course
    Pinckard, R. Neal
    McMahan, C. Alex
    Prihoda, Thomas J.
    Littlefield, John H.
    Jones, Anne Cale
    JOURNAL OF DENTAL EDUCATION, 2009, 73 (08) : 950 - 961
  • [5] Oral maxillofacial surgery resident, faculty and practitioner role models and dental students' interest in oral maxillofacial surgery careers: Does gender matter?
    Marti, Kyriaki C.
    Edwards, Sean P.
    Inglehart, Marita R.
    JOURNAL OF DENTAL EDUCATION, 2023, 87 (07) : 1022 - 1032
  • [6] Large Language Models in Dental Licensing Examinations: Systematic Review and Meta-Analysis
    Liu, Mingxin
    Okuhara, Tsuyoshi
    Huang, Wenbo
    Ogihara, Atsushi
    Nagao, Hikari Sophia
    Okada, Hiroko
    Kiuchi, Takahiro
    INTERNATIONAL DENTAL JOURNAL, 2025, 75 (01) : 213 - 222
  • [7] Performance of Multimodal Large Language Models in Japanese Diagnostic Radiology Board Examinations (2021-2023)
    Nakaura, Takeshi
    Yoshida, Naofumi
    Kobayashi, Naoki
    Nagayama, Yasunori
    Uetani, Hiroyuki
    Kidoh, Masafumi
    Oda, Seitaro
    Funama, Yoshinori
    Hirai, Toshinori
    ACADEMIC RADIOLOGY, 2025, 32 (05) : 2394 - 2401
  • [8] MILA in Teaching Oral & Maxillofacial Surgery
    Nallaswamy, V. Deepak
    Madhulaxmi
    Senthilmurugan
    Neralla, Mahathi
    Abhinav, R. P.
    INTERNATIONAL JOURNAL OF EARLY CHILDHOOD SPECIAL EDUCATION, 2022, 14 (02) : 2116 - 2137
  • [9] Pain Medicine in Oral and Maxillofacial Surgery
    Pandey, Chandrashekhar
    Devadiga, Trupti
    Thorat, Ashutosh Jaysing
    Punde, Prashant Ashok
    JOURNAL OF PHARMACEUTICAL RESEARCH INTERNATIONAL, 2021, 33 (40B) : 221 - 235
  • [10] The Comparative Performance of Large Language Models on the Hand Surgery Self-Assessment Examination
    Chen, Clark J.
    Sobol, Keenan
    Hickey, Connor
    Raphael, James
    HAND-AMERICAN ASSOCIATION FOR HAND SURGERY, 2024,