Large Language Models in Dental Licensing Examinations: Systematic Review and Meta-Analysis

被引:0
|
作者
Liu, Mingxin [1 ]
Okuhara, Tsuyoshi [2 ]
Huang, Wenbo [3 ]
Ogihara, Atsushi [4 ]
Nagao, Hikari Sophia [1 ]
Okada, Hiroko [2 ]
Kiuchi, Takahiro [2 ]
机构
[1] Univ Tokyo, Grad Sch Med, Dept Hlth Commun, Hongo 7-3-1,Bunkyo Ku, Tokyo 1138655, Japan
[2] Univ Tokyo, Sch Publ Hlth, Grad Sch Med, Dept Hlth Commun, Tokyo, Japan
[3] Univ Tokyo, Sch Publ Hlth, Dept Clin Epidemiol & Hlth Econ, Tokyo, Japan
[4] Waseda Univ, Fac Human Sci, Tokorozawa, Japan
关键词
Dentistry; Systematic review; Oral medicine; Dental education; Healthcare;
D O I
10.1016/j.identj.2024.10.014
中图分类号
R78 [口腔科学];
学科分类号
1003 ;
摘要
Introduction and aims: This study systematically reviews and conducts a meta-analysis to evaluate the performance of various large language models (LLMs) in dental licensing examinations worldwide. The aim is to assess the accuracy of these models in different linguistic and geographical contexts. This will inform their potential application in dental education and diagnostics. Methods: Following Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines, we conducted a comprehensive search across PubMed, Web of Science, and Scopus for studies published from 1 January 2022 to 1 May 2024. Two authors independently reviewed the literature based on the inclusion and exclusion criteria, extracted data, and evaluated the quality of the studies in accordance with the Quality Assessment of Diagnostic Accuracy Studies-2. We conducted qualitative and quantitative analyses to evaluate the performance of LLMs. Results: Eleven studies met the inclusion criteria, encompassing dental licensing examinations from eight countries. GPT-3.5, GPT-4, and Bard achieved integrated accuracy rates of 54%, 72%, and 56%, respectively. GPT-4 outperformed GPT-3.5 and Bard, passing more than half of the dental licensing examinations. Subgroup analyses and meta-regression showed that GPT-3.5 performed significantly better in English-speaking countries. GPT-4's performance, however, remained consistent across different regions. Conclusion: LLMs, particularly GPT-4, show potential in dental education and diagnostics, yet their accuracy remains below the threshold required for clinical application. The lack of sufficient training data in dentistry has affected LLMs' accuracy. The reliance on image- based diagnostics also presents challenges. As a result, their accuracy in dental exams is lower compared to medical licensing exams. Additionally, LLMs even provide more detailed explanation for incorrect answer than correct one. Overall, the current LLMs are not yet suitable for use in dental education and clinical diagnosis. (c) 2024 The Authors. Published by Elsevier Inc. on behalf of FDI World Dental Federation. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/)
引用
收藏
页码:213 / 222
页数:10
相关论文
共 50 条
  • [1] Large language models in neurosurgery: a systematic review and meta-analysis
    Patil, Advait
    Serrato, Paul
    Chisvo, Nathan
    Arnaout, Omar
    See, Pokmeng Alfred
    Huang, Kevin T.
    ACTA NEUROCHIRURGICA, 2024, 166 (01)
  • [2] Performance of ChatGPT in medical licensing examinations in countries worldwide: A systematic review and meta-analysis protocol
    Liu, Mingxin
    Okuhara, Tsuyoshi
    Chang, Xinyi
    Okada, Hiroko
    Kiuchi, Takahiro
    PLOS ONE, 2024, 19 (10):
  • [3] Large language models for generating medical examinations: systematic review
    Artsi, Yaara
    Sorin, Vera
    Konen, Eli
    Glicksberg, Benjamin S.
    Nadkarni, Girish
    Klang, Eyal
    BMC MEDICAL EDUCATION, 2024, 24 (01)
  • [4] Application of Emerging Teaching Models in Dental Education: A Systematic Review and Meta-Analysis
    Pang, Xuefei
    Li, Ling
    Liu, Xu
    Wang, Yan
    Yang, Bo
    INTERNATIONAL DENTAL JOURNAL, 2024, 74 (06) : 1185 - 1196
  • [5] Performance of ChatGPT in medical examinations: A systematic review and a meta-analysis
    Levin, Gabriel
    Horesh, Nir
    Brezinov, Yoav
    Meyer, Raanan
    BJOG-AN INTERNATIONAL JOURNAL OF OBSTETRICS AND GYNAECOLOGY, 2024, 131 (03) : 378 - 380
  • [6] Assessing Teledentistry versus In-Person Examinations to Detect Dental Caries: A Systematic Review and Meta-analysis
    Casas, K.
    Dipede, L.
    Toema, S.
    Ogwo, C.
    JDR CLINICAL & TRANSLATIONAL RESEARCH, 2025,
  • [7] Ceramic Dental Implants: A Systematic Review and Meta-analysis
    Neugebauer, Joerg
    Schoenbaum, Todd R.
    Pi-Anfruns, Joan
    Yang, Min
    Lander, Bradley
    Blatz, Markus B.
    Fiorellini, Joseph P.
    INTERNATIONAL JOURNAL OF ORAL & MAXILLOFACIAL IMPLANTS, 2023, 38 : 30 - 36
  • [8] Smoking and dental implants: A systematic review and meta-analysis
    Chrcanovic, Bruno Ramos
    Albrektsson, Tomas
    Wennerberg, Ann
    JOURNAL OF DENTISTRY, 2015, 43 (05) : 487 - 498
  • [9] Bisphosphonates and Dental Implants: A Systematic Review and Meta-Analysis
    Sulaiman, Nabaa
    Fadhul, Fadi
    Chrcanovic, Bruno Ramos
    MATERIALS, 2023, 16 (18)
  • [10] Smoking and Dental Implants: A Systematic Review and Meta-Analysis
    Mustapha, Abir Dunia
    Salame, Zainab
    Chrcanovic, Bruno Ramos
    MEDICINA-LITHUANIA, 2022, 58 (01):