Comparative Analysis of the Response Accuracies of Large Language Models in the Korean National Dental Hygienist Examination Across Korean and English Questions

被引:1
|
作者
Song, Eun Sun [1 ]
Lee, Seung-Pyo [1 ]
机构
[1] Seoul Natl Univ, Sch Dent, Dent Res Inst, Dept Oral Anat, Seoul, South Korea
关键词
artificial intelligence; ChatGPT; dental hygienist; Gemini; large language models; licensing examination;
D O I
10.1111/idh.12848
中图分类号
R78 [口腔科学];
学科分类号
1003 ;
摘要
IntroductionLarge language models such as Gemini, GPT-3.5, and GPT-4 have demonstrated significant potential in the medical field. Their performance in medical licensing examinations globally has highlighted their capabilities in understanding and processing specialized medical knowledge. This study aimed to evaluate and compare the performance of Gemini, GPT-3.5, and GPT-4 in the Korean National Dental Hygienist Examination. The accuracy of answering the examination questions in both Korean and English was assessed.MethodsThis study used a dataset comprising questions from the Korean National Dental Hygienist Examination over 5 years (2019-2023). A two-way analysis of variance (ANOVA) test was employed to investigate the impacts of model type and language on the accuracy of the responses. Questions were input into each model under standardized conditions, and responses were classified as correct or incorrect based on predefined criteria.ResultsGPT-4 consistently outperformed the other models, achieving the highest accuracy rates across both language versions annually. In particular, it showed superior performance in English, suggesting advancements in its training algorithms for language processing. However, all models demonstrated variable accuracies in subjects with localized characteristics, such as health and medical law.ConclusionsThese findings indicate that GPT-4 holds significant promise for application in medical education and standardized testing, especially in English. However, the variability in performance across different subjects and languages underscores the need for ongoing improvements and the inclusion of more diverse and localized training datasets to enhance the models' effectiveness in multilingual and multicultural contexts.
引用
收藏
页数:10
相关论文
共 13 条
  • [1] Performance of Large Language Models on the Korean Dental Licensing Examination: A Comparative Study
    Kim, Woojun
    Kim, Bong Chul
    Yeom, Han-Gyeol
    INTERNATIONAL DENTAL JOURNAL, 2025, 75 (01) : 176 - 184
  • [2] Evaluating the efficacy of leading large language models in the Japanese national dental hygienist examination: A comparative analysis of ChatGPT, Bard, and Bing Chat
    Yamaguchi, Shino
    Morishita, Masaki
    Fukuda, Hikaru
    Muraoka, Kosuke
    Nakamura, Taiji
    Yoshioka, Izumi
    Soh, Inho
    Ono, Kentaro
    Awano, Shuji
    JOURNAL OF DENTAL SCIENCES, 2024, 19 (04) : 2262 - 2267
  • [3] A comparative analysis of large language models on clinical questions for autoimmune diseases
    Chen, Jing
    Ma, Juntao
    Yu, Jie
    Zhang, Weiming
    Zhu, Yijia
    Feng, Jiawei
    Geng, Linyu
    Dong, Xianchi
    Zhang, Huayong
    Chen, Yuxin
    Ning, Mingzhe
    FRONTIERS IN DIGITAL HEALTH, 2025, 7
  • [4] Comparative Evaluation of the Accuracies of Large Language Models in Answering VI-RADS-Related Questions
    Camur, Eren
    Cesur, Turay
    Gunes, Yasin Celal
    KOREAN JOURNAL OF RADIOLOGY, 2024, 25 (08) : 767 - 768
  • [5] Performance of large language models in the National Dental Licensing Examination in China: a comparative analysis of ChatGPT, GPT-4, and New Bing
    Hu, Ziyang
    Xu, Zhe
    Shi, Ping
    Zhang, Dandan
    Yue, Qu
    Zhang, Jiexia
    Lei, Xin
    Lin, Zitong
    INTERNATIONAL JOURNAL OF COMPUTERIZED DENTISTRY, 2024, 27 (04)
  • [6] Evaluating the Effectiveness of advanced large language models in medical Knowledge: A Comparative study using Japanese national medical examination
    Liu, Mingxin
    Okuhara, Tsuyoshi
    Dai, Zhehao
    Huang, Wenbo
    Gu, Lin
    Okada, Hiroko
    Furukawa, Emi
    Kiuchi, Takahiro
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2025, 193
  • [7] Large Language Models Take on Cardiothoracic Surgery: A Comparative Analysis of the Performance of Four Models on American Board of Thoracic Surgery Exam Questions in 2023
    Khalpey, Zain
    Kumar, Ujjawal
    King, Nicholas
    Abraham, Alyssa
    Khalpey, Amina H.
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (07)
  • [8] Theory of mind performance of large language models: A comparative analysis of Turkish and English
    Unlutabak, Burcu
    Bal, Onur
    COMPUTER SPEECH AND LANGUAGE, 2025, 89
  • [9] The Quality of AI-Generated Dental Caries Multiple Choice Questions: A Comparative Analysis of ChatGPT and Google Bard Language Models
    Ahmed, Walaa Magdy
    Azhari, Amr Ahmed
    Alfaraj, Amal
    Alhamadani, Abdulaziz
    Zhang, Min
    Lu, Chang-Tien
    HELIYON, 2024, 10 (07)
  • [10] Comparative Analysis of the Accuracy of Large Language Models in Addressing Common Pulmonary Embolism Patient Questions
    Rosenzveig, Akiva
    Kassab, Joseph
    Sul, Lidiya
    Angelini, Dana
    Chaudhury, Pulkit
    Sarraju, Ashish
    Tefera, Leben
    JOURNAL OF THE AMERICAN HEART ASSOCIATION, 2024, 13 (21):