How does ChatGPT perform on the European Board of Pediatric Surgery examination? A randomized comparative study

被引:2
作者
Azizoglu, Mustafa [1 ]
Aydogdu, Bahattin [2 ]
机构
[1] Dicle Univ, Med Sch, Dept Pediat Surg, Diyarbakir, Turkiye
[2] Balikesir Univ, Dept Pediat Surg, Balikesir, Turkiye
来源
MEDICINA BALEAR | 2024年 / 39卷 / 01期
关键词
ChatGPT; Pediatric Surgery; exam; questions; artificial intelligence;
D O I
10.3306/AJHS.2024.39.01.23
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Purpose: The purpose of this study was to conduct a detailed comparison of the accuracy and responsiveness of GPT-3.5 and GPT-4 in the realm of pediatric surgery. Specifically, we sought to assess their ability to correctly answer a series of sample questions of European Board of Pediatric Surgery (EBPS) exam. Methods: This study was conducted between 20 May 2023 and 30 May 2023. This study undertook a comparative analysis of two AI language models, GPT-3.5 and GPT-4, in the field of pediatric surgery, particularly in the context of EBPS exam sample questions. Two sets of 105 (total 210) sample questions each, derived from the EBPS sample questions, were collated. Results: In General Pediatric Surgery, GPT-3.5 provided correct answers for 7 questions (46.7%), and GPT-4 had a higher accuracy with 13 correct responses (86.7%) (p=0.020). For Newborn Surgery and Pediatric Urology, GPT-3.5 correctly answered 6 questions (40.0%), and GPT-4, however, correctly answered 12 questions (80.0%) (p= 0.025). In total, GPT-3.5 correctly answered 46 questions out of 105 (43.8%), and GPT-4 showed significantly better performance, correctly answering 80 questions (76.2%) (p<0.001). Given the total responses, when GPT-4 was compared with GPT-3.5, the Odds Ratio was found to be 4.1. This suggests that GPT-4 was 4.1 times more likely to provide a correct answer to the pediatric surgery questions compared to GPT-3.5. Conclusion: This comparative study concludes that GPT-4 significantly outperforms GPT-3.5 in responding to EBPS exam questions.
引用
收藏
页码:23 / 26
页数:4
相关论文
共 33 条
  • [21] Performance of DeepSeek-R1 and ChatGPT-4o on the Chinese National Medical Licensing Examination: A Comparative Study
    Jin Wu
    Zhiheng Wang
    Yifan Qin
    Journal of Medical Systems, 49 (1)
  • [22] ChatGPT vs UpToDate: comparative study of usefulness and reliability of Chatbot in common clinical presentations of otorhinolaryngology–head and neck surgery
    Ziya Karimov
    Irshad Allahverdiyev
    Ozlem Yagiz Agayarov
    Dogukan Demir
    Elvina Almuradova
    European Archives of Oto-Rhino-Laryngology, 2024, 281 : 2145 - 2151
  • [23] A Comparison Between GPT-3.5, GPT-4, and GPT-4V: Can the Large Language Model (ChatGPT) Pass the Japanese Board of Orthopaedic Surgery Examination?
    Nakajima, Nozomu
    Fujimori, Takahito
    Furuya, Masayuki
    Kanie, Yuya
    Imai, Hirotatsu
    Kita, Kosuke
    Uemura, Keisuke
    Okada, Seiji
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (03)
  • [24] ChatGPT vs UpToDate: comparative study of usefulness and reliability of Chatbot in common clinical presentations of otorhinolaryngology-head and neck surgery
    Karimov, Ziya
    Allahverdiyev, Irshad
    Agayarov, Ozlem Yagiz
    Demir, Dogukan
    Almuradova, Elvina
    EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY, 2024, 281 (04) : 2145 - 2151
  • [25] Is ChatGPT 'ready' to be a learning tool for medical undergraduates and will it perform equally in different subjects? Comparative study of ChatGPT performance in tutorial and case-based learning questions in physiology and biochemistry
    Luke, W. A. Nathasha V.
    Chong, Lee Seow
    Ban, Kenneth H.
    Wong, Amanda H.
    Xiong, Chen Zhi
    Shing, Lee Shuh
    Taneja, Reshma
    Samarasekera, Dujeepa D.
    Yap, Celestial T.
    MEDICAL TEACHER, 2024, 46 (11) : 1441 - 1447
  • [26] Enhancing clinical skills in pediatric trainees: a comparative study of ChatGPT-assisted and traditional teaching methods
    Ba, Hongjun
    Zhang, Lili
    Yi, Zizheng
    BMC MEDICAL EDUCATION, 2024, 24 (01)
  • [27] How Many Operative Performance Ratings Does a Pediatric Surgery Fellow Need to Be Deemed Practice Ready?
    Spencer, Brianna L.
    Krumm, Andrew
    Izadi, Shawn
    Hirschl, Ronald B.
    Modi, Biren P.
    Ehrlich, Peter
    Newman, Erika A.
    Zendejas, Benjamin
    JOURNAL OF PEDIATRIC SURGERY, 2024, 59 (01) : 31 - 36
  • [28] ChatGPT (GPT-4) versus doctors on complex cases of the Swedish family medicine specialist examination: an observational comparative study
    Arvidsson, Rasmus
    Gunnarsson, Ronny
    Entezarjou, Artin
    Sundemo, David
    Wikberg, Carl
    BMJ OPEN, 2024, 14 (12):
  • [29] Evaluating AI-Generated informed consent documents in oral surgery: A comparative study of ChatGPT-4, Bard gemini advanced, and human-written consents
    Vaira, Luigi Angelo
    Lechien, Jerome R.
    Maniaci, Antonino
    Tanda, Giuseppe
    Abbate, Vincenzo
    Allevi, Fabiana
    Arena, Antonio
    Beltramini, Giada Anna
    Bergonzani, Michela
    Bolzoni, Alessandro Remigio
    Crimi, Salvatore
    Frosolini, Andrea
    Gabriele, Guido
    Maglitto, Fabio
    Mayo-Yanez, Miguel
    Orru, Ludovica
    Petrocelli, Marzia
    Pucci, Resi
    Saibene, Alberto Maria
    Troise, Stefania
    Tel, Alessandro
    Vellone, Valentino
    Chiesa-Estomba, Carlos Miguel
    Boscolo-Rizzo, Paolo
    Salzano, Giovanni
    De Riu, Giacomo
    JOURNAL OF CRANIO-MAXILLOFACIAL SURGERY, 2025, 53 (01) : 18 - 23
  • [30] Assessing the accuracy, usefulness, and readability of artificialintelligence- generated responses to common dermatologic surgery questions for patient education: A double-blinded comparative study of ChatGPT and Google Bard
    Robinson, Michelle A.
    Belzberg, Micah
    Thakker, Sach
    Bibee, Kristin
    Merkel, Emily
    Macfarlane, Deborah F.
    Lim, Jordan
    Scott, Jeffrey F.
    Deng, Min
    Lewin, Jesse
    Soleymani, David
    Rosenfeld, David
    Liu, Rosemarie
    Liu, Tin Yan Alvin
    Ng, Elise
    JOURNAL OF THE AMERICAN ACADEMY OF DERMATOLOGY, 2024, 90 (05) : 1078 - 1080