Evaluating the Performance of Artificial Intelligence-Based Large Language Models in Orthodontics-A Systematic Review and Meta-Analysis

被引:0
作者
Albalawi, Farraj [1 ,2 ]
Khanagar, Sanjeev B. [1 ,2 ]
Iyer, Kiran [1 ,2 ]
Alhazmi, Nora [1 ,2 ]
Alayyash, Afnan [3 ]
Alhazmi, Anwar S. [4 ]
Awawdeh, Mohammed [1 ,2 ]
Singh, Oinam Gokulchandra [2 ,5 ]
机构
[1] King Saud Bin Abdulaziz Univ Hlth Sci, Coll Dent, Prevent Dent Sci Dept, Riyadh 11426, Saudi Arabia
[2] Minist Natl Guard Hlth Affairs, King Abdullah Int Med Res Ctr, Riyadh 11481, Saudi Arabia
[3] Jouf Univ, Coll Dent, Dept Prevent Dent, Sakaka 72345, Saudi Arabia
[4] Jazan Univ, Coll Dent, Dept Prevent Dent, Jazan 45142, Saudi Arabia
[5] King Saud Bin Abdulaziz Univ Hlth Sci, Coll Appl Med Sci, Radiol Sci Program, Riyadh 11426, Saudi Arabia
来源
APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 02期
关键词
artificial intelligence; deep learning; machine learning; large language models; orthodontics; clear aligners; knowledge; information; INFORMATION; QUALITY; CHATGPT;
D O I
10.3390/app15020893
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Background: In recent years, there has been remarkable growth in AI-based applications in healthcare, with a significant breakthrough marked by the launch of large language models (LLMs) such as ChatGPT and Google Bard. Patients and health professional students commonly utilize these models due to their accessibility. The increasing use of LLMs in healthcare necessitates an evaluation of their ability to generate accurate and reliable responses. Objective: This study assessed the performance of LLMs in answering orthodontic-related queries through a systematic review and meta-analysis. Methods: A comprehensive search of PubMed, Web of Science, Embase, Scopus, and Google Scholar was conducted up to 31 October 2024. The quality of the included studies was evaluated using the Prediction model Risk of Bias Assessment Tool (PROBAST), and R Studio software (Version 4.4.0) was employed for meta-analysis and heterogeneity assessment. Results: Out of 278 retrieved articles, 10 studies were included. The most commonly used LLM was ChatGPT (10/10, 100% of papers), followed by Google's Bard/Gemini (3/10, 30% of papers), and Microsoft's Bing/Copilot AI (2/10, 20% of papers). Accuracy was primarily evaluated using Likert scales, while the DISCERN tool was frequently applied for reliability assessment. The meta-analysis indicated that the LLMs, such as ChatGPT-4 and other models, do not significantly differ in generating responses to queries related to the specialty of orthodontics. The forest plot revealed a Standard Mean Deviation of 0.01 [CI: 0.42-0.44]. No heterogeneity was observed between the experimental group (ChatGPT-3.5, Gemini, and Copilot) and the control group (ChatGPT-4). However, most studies exhibited a high PROBAST risk of bias due to the lack of standardized evaluation tools. Conclusions: ChatGPT-4 has been extensively used for a variety of tasks and has demonstrated advanced and encouraging outcomes compared to other LLMs, and thus can be regarded as a valuable tool for enhancing educational and learning experiences. While LLMs can generate comprehensive responses, their reliability is compromised by the absence of peer-reviewed references, necessitating expert oversight in healthcare applications.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] Future of Orthodontics-A Systematic Review and Meta-Analysis on the Emerging Trends in This Field
    Alam, Mohammad Khursheed
    Abutayyem, Huda
    Kanwal, Bushra
    Shayeb, Maher A. L.
    JOURNAL OF CLINICAL MEDICINE, 2023, 12 (02)
  • [2] Comparitive performance of artificial intelligence-based large language models on the orthopedic in-training examination
    Xu, Andrew Y.
    Singh, Manjot
    Balmaceno-Criss, Mariah
    Oh, Allison
    Leigh, David
    Daher, Mohammad
    Alsoof, Daniel
    Mcdonald, Christopher L.
    Diebo, Bassel G.
    Daniels, Alan H.
    JOURNAL OF ORTHOPAEDIC SURGERY, 2025, 33 (01)
  • [3] Diagnostic test accuracy of artificial intelligence-based imaging for lung cancer screening: A systematic review and meta-analysis
    Thong, Lay Teng
    Chou, Hui Shan
    Chew, Han Shi Jocelyn
    Lau, Ying
    LUNG CANCER, 2023, 176 : 4 - 13
  • [4] Accuracy of artificial intelligence for tooth extraction decision-making in orthodontics: a systematic review and meta-analysis
    Evangelista, Karine
    de Freitas Silva, Brunno Santos
    Yamamoto-Silva, Fernanda Paula
    Valladares-Neto, Jose
    Garcia Silva, Maria Alves
    Soares Cevidanes, Lucia Helena
    Canto, Graziela de Luca
    Massignan, Carla
    CLINICAL ORAL INVESTIGATIONS, 2022, 26 (12) : 6893 - 6905
  • [5] Artificial intelligence-based radiomics models in endometrial cancer: A systematic review
    Lecointre, Lise
    Dana, Jeremy
    Lodi, Massimo
    Akladios, Cherif
    Gallix, Benoit
    EJSO, 2021, 47 (11): : 2734 - 2741
  • [6] The predictive performance of artificial intelligence on the outcome of stroke: a systematic review and meta-analysis
    Yang, Yujia
    Tang, Li
    Deng, Yiting
    Li, Xuzi
    Luo, Anling
    Zhang, Zhao
    He, Li
    Zhu, Cairong
    Zhou, Muke
    FRONTIERS IN NEUROSCIENCE, 2023, 17
  • [7] Large language models in neurosurgery: a systematic review and meta-analysis
    Patil, Advait
    Serrato, Paul
    Chisvo, Nathan
    Arnaout, Omar
    See, Pokmeng Alfred
    Huang, Kevin T.
    ACTA NEUROCHIRURGICA, 2024, 166 (01)
  • [8] Utility of artificial intelligence-based large language models in ophthalmic care
    Biswas, Sayantan
    Davies, Leon N.
    Sheppard, Amy L.
    Logan, Nicola S.
    Wolffsohn, James S.
    OPHTHALMIC AND PHYSIOLOGICAL OPTICS, 2024, 44 (03) : 641 - 671
  • [9] Artificial intelligence in osteoarthritis detection: A systematic review and meta-analysis
    Mohammadi, Soheil
    Salehi, Mohammad Amin
    Jahanshahi, Ali
    Farahani, Mohammad Shahrabi
    Zakavi, Seyed Sina
    Behrouzieh, Sadra
    Gouravani, Mahdi
    Guermazi, Ali
    OSTEOARTHRITIS AND CARTILAGE, 2024, 32 (03) : 241 - 253
  • [10] Diagnostic Performance of Artificial Intelligence-Based Methods for Tuberculosis Detection: Systematic Review
    Hansun, Seng
    Argha, Ahmadreza
    Bakhshayeshi, Ivan
    Wicaksana, Arya
    Alinejad-Rokny, Hamid
    Fox, Greg J.
    Liaw, Siaw-Teng
    Celler, Branko G.
    Marks, Guy B.
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2025, 27