Performance of artificial intelligence on Turkish dental specialization exam: can ChatGPT-4.0 and gemini advanced achieve comparable results to humans?

被引：2

作者：

Sismanoglu, Soner ^{[1
]}

Capan, Belen Sirinoglu ^{[2
]}

机构：

[1] Istanbul Univ Cerrahpasa, Fac Dent, Dept Restorat Dent, Istanbul, Turkiye

[2] Istanbul Univ Cerrahpasa, Fac Dent, Dept Pediat Dent, Istanbul, Turkiye

来源：

BMC MEDICAL EDUCATION | 2025年 / 25卷 / 01期

关键词：

AI; Artificial Intelligence; ChatGPT; Dentistry; Gemini; Large Language models;

D O I：

10.1186/s12909-024-06389-9

中图分类号：

G40 [教育学];

学科分类号：

040101 ; 120403 ;

摘要：

BackgroundAI-powered chatbots have spread to various fields including dental education and clinical assistance to treatment planning. The aim of this study is to assess and compare leading AI-powered chatbot performances in dental specialization exam (DUS) administered in Turkey and compare it with the best performer of that year. MethodsDUS questions for 2020 and 2021 were directed to ChatGPT-4.0 and Gemini Advanced individually. DUS questions were manually entered into AI-powered chatbot in their original form, in Turkish. The results obtained were compared with each other and the year's best performers. Candidates who score at least 45 points on this centralized exam are deemed to have passed and are eligible to select their preferred department and institution. The data was statistically analyzed using Pearson's chi-squared test (p < 0.05). ResultsChatGPT-4.0 received 83.3% correct response rate on the 2020 exam, while Gemini Advanced received 65% correct response rate. On the 2021 exam, ChatGPT-4.0 received 80.5% correct response rate, whereas Gemini Advanced received 60.2% correct response rate. ChatGPT-4.0 outperformed Gemini Advanced in both exams (p < 0.05). AI-powered chatbots performed worse in overall score (for 2020: ChatGPT-4.0, 65,5 and Gemini Advanced, 50.1; for 2021: ChatGPT-4.0, 65,6 and Gemini Advanced, 48.6) when compared to overall scores of the best performer of that year (68.5 points for year 2020 and 72.3 points for year 2021). This poor performance also includes the basic sciences and clinical sciences sections (p < 0.001). Additionally, periodontology was the clinical specialty in which both AI-powered chatbots achieved the best results, the lowest performance was determined in the endodontics and orthodontics. ConclusionAI-powered chatbots, namely ChatGPT-4.0 and Gemini Advanced, passed the DUS by exceeding the threshold score of 45. However, they still lagged behind the top performers of that year, particularly in basic sciences, clinical sciences, and overall score. Additionally, they exhibited lower performance in some clinical specialties such as endodontics and orthodontics.

引用

页数：10

共 34 条

[1] ChatGPT in Clinical Toxicology [J].

Abdel-Messih, Mary Sabry ;

Boulos, Maged N. Kamel .

JMIR MEDICAL EDUCATION, 2023, 9

[2] ChatGPT-A double-edged sword for healthcare education? Implications for assessments of dental students [J].

Ali, Kamran ;

Barhom, Noha ;

Tamimi, Faleh ;

Duggal, Monty .

EUROPEAN JOURNAL OF DENTAL EDUCATION, 2024, 28 (01) :206-211

[3] Performance of ChatGPT and GPT-4 on Neurosurgery Written Board Examinations [J].

Ali, Rohaid ;

Tang, Oliver Y. ;

Connolly, Ian D. ;

Sullivan, Patricia L. Zadnik ;

Shin, John H. ;

Fridley, Jared S. ;

Asaad, Wael F. ;

Cielo, Deus ;

Oyelese, Adetokunbo A. ;

Doberstein, Curtis E. ;

Gokaslan, Ziya L. ;

Telfeian, Albert E. .

NEUROSURGERY, 2023, 93 (06) :1353-1365

[4]

Ali Rohaid, 2023, Neurosurgery, V93, P1090, DOI 10.1227/neu.0000000000002551

[5] Evaluation of AI-generated responses by different artificial intelligence chatbots to the clinical decision-making case-based questions in oral and maxillofacial surgery [J].

Azadi, Ali ;

Gorjinejad, Fatemeh ;

Mohammad-Rahimi, Hossein ;

Tabrizi, Reza ;

Alam, Mostafa ;

Golkar, Mohsen .

ORAL SURGERY ORAL MEDICINE ORAL PATHOLOGY ORAL RADIOLOGY, 2024, 137 (06) :587-593

[6] Retrieval-Augmented Generation for Large Language Models in Radiology: Another Leap Forward in Board Examination Performance [J].

Bhayana, Rajesh ;

Fawzy, Aly ;

Deng, Yangqing ;

Bleakney, Robert R. ;

Krishna, Satheesh .

RADIOLOGY, 2024, 313 (01)

[7] Large language models as assistance for glaucoma surgical cases: a ChatGPT vs. Google Gemini comparison [J].

Carla, Matteo Mario ;

Gambini, Gloria ;

Baldascino, Antonio ;

Boselli, Francesco ;

Giannuzzi, Federico ;

Margollicci, Fabio ;

Rizzo, Stanislao .

GRAEFES ARCHIVE FOR CLINICAL AND EXPERIMENTAL OPHTHALMOLOGY, 2024, 262 (09) :2945-2959

[8] Evaluating the Feasibility of ChatGPT in Healthcare: An Analysis of Multiple Clinical and Research Scenarios [J].

Cascella, Marco ;

Montomoli, Jonathan ;

Bellini, Valentina ;

Bignami, Elena .

JOURNAL OF MEDICAL SYSTEMS, 2023, 47 (01)

[9]

Culhaoglu AK., 2021, J Dent Fac Atatrk Uni, V31, P420, DOI [10.17567/ataunidfd.911839, DOI 10.17567/ATAUNIDFD.911839]

[10] Artificial intelligence in dental education: ChatGPT's performance on the periodontic in-service examination [J].

Danesh, Arman ;

Pazouki, Hirad ;

Danesh, Farzad ;

Danesh, Arsalan ;

Vardar-Sengul, Saynur .

JOURNAL OF PERIODONTOLOGY, 2024, 95 (07) :682-687

← 1 2 3 4 →