共 1 条
Performance of artificial intelligence on Turkish dental specialization exam: can ChatGPT-4.0 and gemini advanced achieve comparable results to humans?
被引:1
|作者:
Sismanoglu, Soner
[1
]
Capan, Belen Sirinoglu
[2
]
机构:
[1] Istanbul Univ Cerrahpasa, Fac Dent, Dept Restorat Dent, Istanbul, Turkiye
[2] Istanbul Univ Cerrahpasa, Fac Dent, Dept Pediat Dent, Istanbul, Turkiye
关键词:
AI;
Artificial Intelligence;
ChatGPT;
Dentistry;
Gemini;
Large Language models;
D O I:
10.1186/s12909-024-06389-9
中图分类号:
G40 [教育学];
学科分类号:
040101 ;
120403 ;
摘要:
BackgroundAI-powered chatbots have spread to various fields including dental education and clinical assistance to treatment planning. The aim of this study is to assess and compare leading AI-powered chatbot performances in dental specialization exam (DUS) administered in Turkey and compare it with the best performer of that year. MethodsDUS questions for 2020 and 2021 were directed to ChatGPT-4.0 and Gemini Advanced individually. DUS questions were manually entered into AI-powered chatbot in their original form, in Turkish. The results obtained were compared with each other and the year's best performers. Candidates who score at least 45 points on this centralized exam are deemed to have passed and are eligible to select their preferred department and institution. The data was statistically analyzed using Pearson's chi-squared test (p < 0.05). ResultsChatGPT-4.0 received 83.3% correct response rate on the 2020 exam, while Gemini Advanced received 65% correct response rate. On the 2021 exam, ChatGPT-4.0 received 80.5% correct response rate, whereas Gemini Advanced received 60.2% correct response rate. ChatGPT-4.0 outperformed Gemini Advanced in both exams (p < 0.05). AI-powered chatbots performed worse in overall score (for 2020: ChatGPT-4.0, 65,5 and Gemini Advanced, 50.1; for 2021: ChatGPT-4.0, 65,6 and Gemini Advanced, 48.6) when compared to overall scores of the best performer of that year (68.5 points for year 2020 and 72.3 points for year 2021). This poor performance also includes the basic sciences and clinical sciences sections (p < 0.001). Additionally, periodontology was the clinical specialty in which both AI-powered chatbots achieved the best results, the lowest performance was determined in the endodontics and orthodontics. ConclusionAI-powered chatbots, namely ChatGPT-4.0 and Gemini Advanced, passed the DUS by exceeding the threshold score of 45. However, they still lagged behind the top performers of that year, particularly in basic sciences, clinical sciences, and overall score. Additionally, they exhibited lower performance in some clinical specialties such as endodontics and orthodontics.
引用
收藏
页数:10
相关论文