How does ChatGPT perform on the European Board of Pediatric Surgery examination? A randomized comparative study

被引：2

作者：

Azizoglu, Mustafa ^{[1
]}

Aydogdu, Bahattin ^{[2
]}

机构：

[1] Dicle Univ, Med Sch, Dept Pediat Surg, Diyarbakir, Turkiye

[2] Balikesir Univ, Dept Pediat Surg, Balikesir, Turkiye

来源：

MEDICINA BALEAR | 2024年 / 39卷 / 01期

关键词：

ChatGPT; Pediatric Surgery; exam; questions; artificial intelligence;

D O I：

10.3306/AJHS.2024.39.01.23

中图分类号：

R5 [内科学];

学科分类号：

1002 ; 100201 ;

摘要：

Purpose: The purpose of this study was to conduct a detailed comparison of the accuracy and responsiveness of GPT-3.5 and GPT-4 in the realm of pediatric surgery. Specifically, we sought to assess their ability to correctly answer a series of sample questions of European Board of Pediatric Surgery (EBPS) exam. Methods: This study was conducted between 20 May 2023 and 30 May 2023. This study undertook a comparative analysis of two AI language models, GPT-3.5 and GPT-4, in the field of pediatric surgery, particularly in the context of EBPS exam sample questions. Two sets of 105 (total 210) sample questions each, derived from the EBPS sample questions, were collated. Results: In General Pediatric Surgery, GPT-3.5 provided correct answers for 7 questions (46.7%), and GPT-4 had a higher accuracy with 13 correct responses (86.7%) (p=0.020). For Newborn Surgery and Pediatric Urology, GPT-3.5 correctly answered 6 questions (40.0%), and GPT-4, however, correctly answered 12 questions (80.0%) (p= 0.025). In total, GPT-3.5 correctly answered 46 questions out of 105 (43.8%), and GPT-4 showed significantly better performance, correctly answering 80 questions (76.2%) (p<0.001). Given the total responses, when GPT-4 was compared with GPT-3.5, the Odds Ratio was found to be 4.1. This suggests that GPT-4 was 4.1 times more likely to provide a correct answer to the pediatric surgery questions compared to GPT-3.5. Conclusion: This comparative study concludes that GPT-4 significantly outperforms GPT-3.5 in responding to EBPS exam questions.

引用

页码：23 / 26

页数：4

共 33 条

[11] Comparative Performance of ChatGPT 3.5 and GPT4 on Rhinology Standardized Board Examination Questions
Patel, Evan A.
Fleischer, Lindsay
Filip, Peter
Eggerstedt, Michael
Hutz, Michael
Michaelides, Elias
Batra, Pete S.
Tajudeen, Bobby A.
OTO OPEN, 2024, 8 (02)
[12] Performance of ChatGPT-3.5 and ChatGPT-4 in the Taiwan National Pharmacist Licensing Examination: Comparative Evaluation Study
Wang, Ying-Mei
Shen, Hung-Wei
Chen, Tzeng-Ji
Chiang, Shu-Chiung
Lin, Ting-Guan
JMIR MEDICAL EDUCATION, 2025, 11
[13] Can AI pass the written European Board Examination in Neurological Surgery? - Ethical and practical issues
Stengel, Felix C.
Stienen, Martin N.
Ivanov, Marcel
Gandia-Gonz, Maria L.
Raffa, Giovanni
Ganau, Mario
Whitfield, Peter
Motov, Stefan
BRAIN AND SPINE, 2024, 4
[14] Exploring the Potential of ChatGPT-4 in Predicting Refractive Surgery Categorizations: Comparative Study
Cirkovic, Aleksandar
Katz, Toam
JMIR FORMATIVE RESEARCH, 2023, 7
[15] Success of ChatGPT, an AI language model, in taking the French language version of the European Board of Ophthalmology examination: A novel approach to medical knowledge assessment
Panthier, C.
Gatinel, D.
JOURNAL FRANCAIS D OPHTALMOLOGIE, 2023, 46 (07): : 706 - 711
[16] The use of ChatGPT and Google Gemini in responding to orthognathic surgery-related questions: A comparative study
Aziz, Ahmed A. Abdel
Abdelrahman, Hams H.
Hassan, Mohamed G.
JOURNAL OF THE WORLD FEDERATION OF ORTHODONTISTS, 2025, 14 (01) : 20 - 26
[17] How to Prepare for the American Board of Surgery In-Training Examination (ABSITE): A Systematic Review
Velez, David Ray
Johnson, Stefan Walter
Sticca, Robert Peter
JOURNAL OF SURGICAL EDUCATION, 2022, 79 (01) : 216 - 228
[18] Variability in Large Language Models' Responses to Medical Licensing and Certification Examinations. Comment on "How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment"
Epstein, Richard H.
Dexter, Franklin
JMIR MEDICAL EDUCATION, 2023, 9
[19] ChatGPT as a Source for Patient Information on Patellofemoral Surgery-A Comparative Study Amongst Laymen, Doctors, and Experts
Frodl, Andreas
Fuchs, Andreas
Yilmaz, Tayfun
Izadpanah, Kaywan
Schmal, Hagen
Siegel, Markus
CLINICS AND PRACTICE, 2024, 14 (06) : 2376 - 2384
[20] Can ChatGPT Fool the Match? Artificial Intelligence Personal Statements for Plastic Surgery Residency Applications: A Comparative Study
Chen, Jeffrey
Tao, Brendan K.
Park, Shihyun
Bovill, Esta
PLASTIC SURGERY, 2024,

← 1 2 3 4 →