Performance of ChatGPT and Bard in self-assessment questions for nephrology board renewal

被引:10
|
作者
Noda, Ryunosuke [1 ]
Izaki, Yuto [1 ]
Kitano, Fumiya [1 ]
Komatsu, Jun [1 ]
Ichikawa, Daisuke [1 ]
Shibagaki, Yugo [1 ]
机构
[1] St Marianna Univ, Dept Internal Med, Div Nephrol & Hypertens, Sch Med, 2-16-1 Sugao,Miyamae Ku, Kawasaki, Kanagawa 2168511, Japan
关键词
ChatGPT; GPT-4; Large language models; Artificial intelligence; Nephrology;
D O I
10.1007/s10157-023-02451-w
中图分类号
R5 [内科学]; R69 [泌尿科学(泌尿生殖系疾病)];
学科分类号
1002 ; 100201 ;
摘要
Background Large language models (LLMs) have impacted advances in artificial intelligence. While LLMs have demonstrated high performance in general medical examinations, their performance in specialized areas such as nephrology is unclear. This study aimed to evaluate ChatGPT and Bard in their potential nephrology applications. Methods Ninety-nine questions from the Self-Assessment Questions for Nephrology Board Renewal from 2018 to 2022 were presented to two versions of ChatGPT (GPT-3.5 and GPT-4) and Bard. We calculated the correct answer rates for the five years, each year, and question categories and checked whether they exceeded the pass criterion. The correct answer rates were compared with those of the nephrology residents. Results The overall correct answer rates for GPT-3.5, GPT-4, and Bard were 31.3% (31/99), 54.5% (54/99), and 32.3% (32/99), respectively, thus GPT-4 significantly outperformed GPT-3.5 (p < 0.01) and Bard (p < 0.01). GPT-4 passed in three years, barely meeting the minimum threshold in two. GPT-4 demonstrated significantly higher performance in problem-solving, clinical, and non-image questions than GPT-3.5 and Bard. GPT-4's performance was between third- and fourth-year nephrology residents. Conclusions GPT-4 outperformed GPT-3.5 and Bard and met the Nephrology Board renewal standards in specific years, albeit marginally. These results highlight LLMs' potential and limitations in nephrology. As LLMs advance, nephrologists should understand their performance for future applications.
引用
收藏
页码:465 / 469
页数:5
相关论文
共 34 条
  • [21] Artificial Intelligence on the Exam Table: ChatGPT's Advancement in Urology Self-assessment
    Cadiente, Angelo
    Chen, Jamie
    Nguyen, Jennifer
    Sadeghi-Nejad, Hossein
    Billah, Mubashir
    UROLOGY PRACTICE, 2023, 10 (06) : 521 - 523
  • [22] The performance of arti fi cial intelligence models in generating responses to general orthodontic questions: ChatGPT vs Google Bard
    Daraqel, Baraa
    Wafaie, Khaled
    Mohammed, Hisham
    Cao, Li
    Mheissen, Samer
    Liu, Yang
    Zheng, Leilei
    AMERICAN JOURNAL OF ORTHODONTICS AND DENTOFACIAL ORTHOPEDICS, 2024, 165 (06) : 652 - 662
  • [23] Comparative Performance of ChatGPT 3.5 and GPT4 on Rhinology Standardized Board Examination Questions
    Patel, Evan A.
    Fleischer, Lindsay
    Filip, Peter
    Eggerstedt, Michael
    Hutz, Michael
    Michaelides, Elias
    Batra, Pete S.
    Tajudeen, Bobby A.
    OTO OPEN, 2024, 8 (02)
  • [24] The Scientific Knowledge of Bard and ChatGPT in Endocrinology, Diabetes, and Diabetes Technology: Multiple-Choice Questions Examination-Based Performance
    Meo, Sultan Ayoub
    Al-Khlaiwi, Thamir
    Abukhalaf, Abdulelah Adnan
    Meo, Anusha Sultan
    Klonoff, David C.
    JOURNAL OF DIABETES SCIENCE AND TECHNOLOGY, 2023,
  • [25] The Comparative Performance of Large Language Models on the Hand Surgery Self-Assessment Examination
    Chen, Clark J.
    Sobol, Keenan
    Hickey, Connor
    Raphael, James
    HAND-AMERICAN ASSOCIATION FOR HAND SURGERY, 2024,
  • [26] New Artificial Intelligence ChatGPT Performs Poorly on the 2022 Self-assessment Study Program for Urology
    Huynh, Linda My
    Bonebrake, Benjamin T.
    Schultis, Kaitlyn
    Quach, Alan
    Deibert, Christopher M.
    UROLOGY PRACTICE, 2023, 10 (04) : 408 - +
  • [27] The performance of artificial intelligence language models in board-style dental knowledge assessment A preliminary study on ChatGPT
    Danesh, Arman
    Pazouki, Hirad
    Danesh, Kasra
    Danesh, Farzad
    Danesh, Arsalan
    JOURNAL OF THE AMERICAN DENTAL ASSOCIATION, 2023, 154 (11) : 970 - 974
  • [28] Performance of artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in the American Society for Metabolic and Bariatric Surgery textbook of bariatric surgery questions
    Lee, Yung
    Brar, Karanbir
    Malone, Sarah
    Jin, David
    McKechnie, Tyler
    Jung, James J.
    Kroh, Matthew
    Dang, Jerry T.
    SURGERY FOR OBESITY AND RELATED DISEASES, 2024, 20 (07) : 609 - 613
  • [29] Assessment Study of ChatGPT-3.5's Performance on the Final Polish Medical Examination: Accuracy in Answering 980 Questions
    Siebielec, Julia
    Ordak, Michal
    Oskroba, Agata
    Dworakowska, Anna
    Bujalska-Zadrozny, Magdalena
    HEALTHCARE, 2024, 12 (16)
  • [30] Performance evaluation of ChatGPT-4.0 and Gemini on image-based neurosurgery board practice questions: A comparative analysis
    Mcnulty, Alana M.
    Valluri, Harshitha
    Gajjar, Avi A.
    Custozzo, Amanda
    Field, Nicholas C.
    Paul, Alexandra R.
    JOURNAL OF CLINICAL NEUROSCIENCE, 2025, 134