The performance of artificial intelligence language models in board-style dental knowledge assessment A preliminary study on ChatGPT

被引:18
作者
Danesh, Arman [1 ,5 ]
Pazouki, Hirad [2 ]
Danesh, Kasra [3 ]
Danesh, Farzad
Danesh, Arsalan [4 ]
机构
[1] Western Univ, Schulich Sch Med & Dent, London, ON, Canada
[2] Western Univ, Fac Hlth Sci, London, ON, Canada
[3] Florida Atlantic Univ, Coll Engn & Comp Sci, Boca Raton, FL 33431 USA
[4] Nova Southeastern Univ, Coll Dent Med, Dept Periodontol, Ft Lauderdale, FL 33314 USA
[5] Nova Southeastern Univ, Coll Dent Med, Dept Oral & Maxillofacial Surg, 3200 S Univ Dr, Davie, FL 33328 USA
关键词
Artificial intelligence; ChatGPT; dental board examination; dental education; dentistry; Integrated National Board Dental Examination;
D O I
10.1016/j.adaj.2023.07.016
中图分类号
R78 [口腔科学];
学科分类号
1003 ;
摘要
Background. Although Chat Generative Pre-trained Transformer (ChatGPT) (OpenAI) may be an appealing educational resource for students, the chatbot responses can be subject to misinfor-mation. This study was designed to evaluate the performance of ChatGPT on a board-style mul-tiple-choice dental knowledge assessment to gauge its capacity to output accurate dental content and in turn the risk of misinformation associated with use of the chatbot as an educational resource by dental students.Methods. ChatGPT3.5 and ChatGPT4 were asked questions obtained from 3 different sources: INBDE Bootcamp, ITDOnline, and a list of board-style questions provided by the Joint Commis-sion on National Dental Examinations. Image-based questions were excluded, as ChatGPT only takes text-based inputs. The mean performance across 3 trials was reported for each model.Results. ChatGPT3.5 and ChatGPT4 answered 61.3% and 76.9% of the questions correctly on average, respectively. A 2-tailed t test was used to compare 2 independent sample means, and a 2-tailed c2 test was used to compare 2 sample proportions. A P value less than .05 was considered to be statistically significant.Conclusion. ChatGPT3.5 did not perform sufficiently well on the board-style knowledge assess-ment. ChatGPT4, however, displayed a competent ability to output accurate dental content. Future research should evaluate the proficiency of emerging models of ChatGPT in dentistry to assess its evolving role in dental education. Practical Implications. Although ChatGPT showed an impressive ability to output accurate dental content, our findings should encourage dental students to incorporate ChatGPT to sup-plement their existing learning program instead of using it as their primary learning resource.
引用
收藏
页码:970 / 974
页数:5
相关论文
共 50 条
[41]   Potentials and pitfalls of ChatGPT and natural-language artificial intelligence models for the understanding of laboratory medicine test results. An assessment by the European Federation of Clinical Chemistry and Laboratory Medicine (EFLM) Working Group on Artificial Intelligence (WG-AI) [J].
Cadamuro, Janne ;
Cabitza, Federico ;
Debeljak, Zeljko ;
De Bruyne, Sander ;
Frans, Glynis ;
Perez, Salomon Martin ;
Ozdemir, Habib ;
Tolios, Alexander ;
Carobene, Anna ;
Padoan, Andrea .
CLINICAL CHEMISTRY AND LABORATORY MEDICINE, 2023, 61 (07) :1158-1166
[42]   The performance of artificial intelligence chatbot large language models to address skeletal biology and bone health queries [J].
Cung, Michelle ;
Sosa, Branden ;
Yang, He S. ;
McDonald, Michelle M. ;
Matthews, Brya G. ;
Vlug, Annegreet G. ;
Imel, Erik A. ;
Wein, Marc N. ;
Stein, Emily Margaret ;
Greenblatt, Matthew B. .
JOURNAL OF BONE AND MINERAL RESEARCH, 2024, 39 (02) :106-115
[43]   Triage Performance Across Large Language Models, ChatGPT, and Untrained Doctors in Emergency Medicine: Comparative Study [J].
Masanneck, Lars ;
Schmidt, Linea ;
Seifert, Antonia ;
Koelsche, Tristan ;
Huntemann, Niklas ;
Jansen, Robin ;
Mehsin, Mohammed ;
Bernhard, Michael ;
Meuth, Sven G. ;
Boehm, Lennert ;
Pawlitzki, Marc .
JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
[44]   Advancing Medical Education: Performance of Generative Artificial Intelligence Models on Otolaryngology Board Preparation Questions With Image Analysis Insights [J].
Terwilliger, Emma ;
Bcharah, George ;
Bcharah, Hend ;
Bcharah, Estefana ;
Richardson, Clare ;
Scheffler, Patrick .
CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (07)
[45]   Evidence-based potential of generative artificial intelligence large language models in orthodontics: a comparative study of ChatGPT, Google Bard, and Microsoft Bing [J].
Makrygiannakis, Miltiadis A. ;
Giannakopoulos, Kostis ;
Kaklamanos, Eleftherios G. .
EUROPEAN JOURNAL OF ORTHODONTICS, 2024,
[46]   Knowledge, perception, and attitude of Egyptian dental students toward the role of robotics and artificial intelligence in dental practices - a cross-sectional study [J].
Naglaa Ezzeldin ;
Aya A. Salama ;
Karim A. Shehab .
BMC Oral Health, 25 (1)
[47]   Complications Following Facelift and Neck Lift: Implementation and Assessment of Large Language Model and Artificial Intelligence (ChatGPT) Performance Across 16 Simulated Patient Presentations [J].
Abi-Rafeh, Jad ;
Hanna, Steven ;
Bassiri-Tehrani, Brian ;
Kazan, Roy ;
Nahai, Foad .
AESTHETIC PLASTIC SURGERY, 2023, 47 (6) :2407-2414
[48]   Knowledge, attitudes, and perceptions of a group of Egyptian dental students toward artificial intelligence: a cross-sectional study [J].
Elchaghaby, Marwa ;
Wahby, Reem .
BMC ORAL HEALTH, 2025, 25 (01)
[49]   Experimental assessment of the performance of artificial intelligence in solving multiple-choice board exams in cardiology [J].
Huwiler, Jessica ;
Oechslin, Luca ;
Biaggi, Patric ;
Tanner, Felix C. ;
Wyss, Christophe Alain .
SWISS MEDICAL WEEKLY, 2024, 154
[50]   Complications Following Facelift and Neck Lift: Implementation and Assessment of Large Language Model and Artificial Intelligence (ChatGPT) Performance Across 16 Simulated Patient Presentations [J].
Jad Abi-Rafeh ;
Steven Hanna ;
Brian Bassiri-Tehrani ;
Roy Kazan ;
Foad Nahai .
Aesthetic Plastic Surgery, 2023, 47 :2407-2414