Performance of ChatGPT and Bard in self-assessment questions for nephrology board renewal

被引：10

作者：

Noda, Ryunosuke ^{[1
]}

Izaki, Yuto ^{[1
]}

Kitano, Fumiya ^{[1
]}

Komatsu, Jun ^{[1
]}

Ichikawa, Daisuke ^{[1
]}

Shibagaki, Yugo ^{[1
]}

机构：

[1] St Marianna Univ, Dept Internal Med, Div Nephrol & Hypertens, Sch Med, 2-16-1 Sugao,Miyamae Ku, Kawasaki, Kanagawa 2168511, Japan

来源：

CLINICAL AND EXPERIMENTAL NEPHROLOGY | 2024年 / 28卷 / 05期

关键词：

ChatGPT; GPT-4; Large language models; Artificial intelligence; Nephrology;

D O I：

10.1007/s10157-023-02451-w

中图分类号：

R5 [内科学]; R69 [泌尿科学（泌尿生殖系疾病）];

学科分类号：

1002 ; 100201 ;

摘要：

Background Large language models (LLMs) have impacted advances in artificial intelligence. While LLMs have demonstrated high performance in general medical examinations, their performance in specialized areas such as nephrology is unclear. This study aimed to evaluate ChatGPT and Bard in their potential nephrology applications. Methods Ninety-nine questions from the Self-Assessment Questions for Nephrology Board Renewal from 2018 to 2022 were presented to two versions of ChatGPT (GPT-3.5 and GPT-4) and Bard. We calculated the correct answer rates for the five years, each year, and question categories and checked whether they exceeded the pass criterion. The correct answer rates were compared with those of the nephrology residents. Results The overall correct answer rates for GPT-3.5, GPT-4, and Bard were 31.3% (31/99), 54.5% (54/99), and 32.3% (32/99), respectively, thus GPT-4 significantly outperformed GPT-3.5 (p < 0.01) and Bard (p < 0.01). GPT-4 passed in three years, barely meeting the minimum threshold in two. GPT-4 demonstrated significantly higher performance in problem-solving, clinical, and non-image questions than GPT-3.5 and Bard. GPT-4's performance was between third- and fourth-year nephrology residents. Conclusions GPT-4 outperformed GPT-3.5 and Bard and met the Nephrology Board renewal standards in specific years, albeit marginally. These results highlight LLMs' potential and limitations in nephrology. As LLMs advance, nephrologists should understand their performance for future applications.

引用

页码：465 / 469

页数：5

共 34 条

[1] The Performance of ChatGPT on the American Society for Surgery of the Hand Self-Assessment Examination
Arango, Sebastian D.
Flynn, Jason C.
Zeitlin, Jacob
Wilson, Matthew S.
Strohl, Adam B.
Weiss, Lawrence E.
Weir, Tristan B.
CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (04)
[2] Performance Assessment of ChatGPT versus Bard in Detecting Alzheimer's Dementia
Balamurali, B. T.
Chen, Jer-Ming
DIAGNOSTICS, 2024, 14 (08)
[3] Performance of artificial intelligence chatbots in sleep medicine certification board exams: ChatGPT versus Google Bard
Cheong, Ryan Chin Taw
Pang, Kenny Peter
Unadkat, Samit
Mcneillis, Venkata
Williamson, Andrew
Joseph, Jonathan
Randhawa, Premjit
Andrews, Peter
Paleri, Vinidh
EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY, 2024, 281 (04) : 2137 - 2143
[4] Performance of artificial intelligence chatbots in sleep medicine certification board exams: ChatGPT versus Google Bard
Ryan Chin Taw Cheong
Kenny Peter Pang
Samit Unadkat
Venkata Mcneillis
Andrew Williamson
Jonathan Joseph
Premjit Randhawa
Peter Andrews
Vinidh Paleri
European Archives of Oto-Rhino-Laryngology, 2024, 281 : 2137 - 2143
[5] Evaluating Performance of ChatGPT on MKSAP Cardiology Board Review Questions
Milutinovic, Stefan
Petrovic, Marija
Begosh-Mayne, Dustin
Lopez-Mattei, Juan
Chazal, Richard A.
Wood, Malissa J.
Escarcega, Ricardo O.
INTERNATIONAL JOURNAL OF CARDIOLOGY, 2024, 417
[6] Could ChatGPT-4 pass an anaesthesiology board examination? Follow-up assessment of a comprehensive set of board examination practice questions
Shay, Denys
Kumar, Bhawesh
Redaelli, Simone
von Wedel, Dario
Liu, Manqing
Dershwitz, Mark
Schaefer, Maximilian S.
Beam, Andrew
BRITISH JOURNAL OF ANAESTHESIA, 2024, 132 (01) : 172 - 174
[7] Performance of ChatGPT on Solving Orthopedic Board-Style Questions: A Comparative Analysis of ChatGPT 3.5 and ChatGPT 4
Kim, Sung Eun
Lee, Ji Han
Choi, Byung Sun
Han, Hyuk-Soo
Lee, Myung Chul
Ro, Du Hyun
CLINICS IN ORTHOPEDIC SURGERY, 2024, 16 (04) : 669 - 673
[8] Assessment of ChatGPT success with specialty medical knowledge using anaesthesiology board examination practice questions
Shay, Denys
Kumar, Bhawesh
Bellamy, David
Palepu, Anil
Dershwitz, Mark
Walz, Jens M.
Schaefer, Maximilian S.
Beam, Andrew
BRITISH JOURNAL OF ANAESTHESIA, 2023, 131 (02)
[9] Comparative Performance of ChatGPT and Bard in a Text-Based Radiology Knowledge Assessment
Patil, Nikhil
Huang, Ryan
van der Pol, Christian
Larocque, Natasha
CANADIAN ASSOCIATION OF RADIOLOGISTS JOURNAL-JOURNAL DE L ASSOCIATION CANADIENNE DES RADIOLOGISTES, 2024, 75 (02): : 344 - 350
[10] Performance of ChatGPT on American Board of Surgery In-Training Examination Preparation Questions
Tran, Catherine G.
Chang, Jeremy
Sherman, Scott K.
De Andrade, James P.
JOURNAL OF SURGICAL RESEARCH, 2024, 299 : 329 - 335

← 1 2 3 4 →