Artificial Intelligence in Ophthalmology: A Comparative Analysis of GPT-3.5, GPT-4, and Human Expertise in Answering StatPearls Questions

被引：80

作者：

Moshirfar, Majid ^{[1
,2
,3
]}

Altaf, Amal W. ^{[4
]}

Stoakes, Isabella M. ^{[5
,6
]}

Tuttle, Jared J. ^{[7
]}

Hoopes, Phillip C. ^{[6
]}

机构：

[1] Hoopes Vis Res Ctr, Corneal & Refract Surg, Draper, UT 84020 USA

[2] Univ Utah, Ophthalmol, Salt Lake City, UT 84112 USA

[3] Utah Lions Eye Bank, Eye Banking & Corneal Transplantat, Murray, UT 84107 USA

[4] Univ Arizona, Coll Med Phoenix, Med Sch, Phoenix, AZ 85721 USA

[5] Pacific Northwest Univ Hlth Sci, Med Sch, Yakima, WA USA

[6] Hoopes Vis Res Ctr, Ophthalmol, Draper, UT 84020 USA

[7] Univ Texas Hlth Sci Ctr San Antonio, Med Sch, San Antonio, TX 78229 USA

来源：

CUREUS JOURNAL OF MEDICAL SCIENCE | 2023年 / 15卷 / 06期

关键词：

cornea; chatgpt-4; chatgpt-3; 5; conversational generative pre-trained transformer; chatbot; ophthalmology; clinical decision-making; conversational ai; statpearls; artificial intelligence;

D O I：

10.7759/cureus.40822

中图分类号：

R5 [内科学];

学科分类号：

1002 ; 100201 ;

摘要：

ImportanceChat Generative Pre-Trained Transformer (ChatGPT) has shown promising performance in various fields, including medicine, business, and law, but its accuracy in specialty-specific medical questions, particularly in ophthalmology, is still uncertain.PurposeThis study evaluates the performance of two ChatGPT models (GPT-3.5 and GPT-4) and human professionals in answering ophthalmology questions from the StatPearls question bank, assessing their outcomes, and providing insights into the integration of artificial intelligence (AI) technology in ophthalmology.MethodsChatGPT's performance was evaluated using 467 ophthalmology questions from the StatPearls question bank. These questions were stratified into 11 subcategories, four difficulty levels, and three generalized anatomical categories. The answer accuracy of GPT-3.5, GPT-4, and human participants was assessed. Statistical analysis was conducted via the Kolmogorov-Smirnov test for normality, one-way analysis of variance (ANOVA) for the statistical significance of GPT-3 versus GPT-4 versus human performance, and repeated unpaired two-sample t-tests to compare the means of two groups. ResultsGPT-4 outperformed both GPT-3.5 and human professionals on ophthalmology StatPearls questions, except in the "Lens and Cataract" category. The performance differences were statistically significant overall, with GPT-4 achieving higher accuracy (73.2%) compared to GPT-3.5 (55.5%, p-value < 0.001) and humans (58.3%, p-value < 0.001). There were variations in performance across difficulty levels (rated one to four), but GPT-4 consistently performed better than both GPT-3.5 and humans on level-two,-three, and-four questions. On questions of level-four difficulty, human performance significantly exceeded that of GPT-3.5 (p = 0.008).ConclusionThe study's findings demonstrate GPT-4's significant performance improvements over GPT-3.5 and human professionals on StatPearls ophthalmology questions. Our results highlight the potential of advanced conversational AI systems to be utilized as important tools in the education and practice of medicine.

引用

页数：9

共 18 条

[1]

[Anonymous], 2023, WOULD CHATGPT GET WH

[2]

[Anonymous], 2023, GPT4 IS OPENAIS MOST

[3]

[Anonymous], 2023, OKAP CONTENT OUTLINE

[4] Evaluating the Performance of ChatGPT in Ophthalmology [J].

Antaki, Fares ;

Touma, Samir ;

Milad, Daniel ;

El -Khoury, Jonathan ;

Duval, Renaud .

OPHTHALMOLOGY SCIENCE, 2023, 3 (04)

[5]

Choi J.H., 2023, Journal of Legal Education

[6] Performance of ChatGPT on the Plastic Surgery Inservice Training Examination [J].

Gupta, Rohun ;

Herzog, Isabel ;

Park, John B. ;

Weisberger, Joseph ;

Firouzbakht, Peter ;

Ocon, Vanessa ;

Chao, John ;

Lee, Edward S. ;

Mailey, Brian A. .

AESTHETIC SURGERY JOURNAL, 2023, :NP1078-NP1082

[7]

Hirosawa Takanobu, 2023, Int J Environ Res Public Health, V20, DOI 10.3390/ijerph20043378

[8] Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models [J].

Kung, Tiffany H. ;

Cheatham, Morgan ;

Medenilla, Arielle ;

Sillos, Czarina ;

De Leon, Lorie ;

Elepano, Camille ;

Madriaga, Maria ;

Aggabao, Rimel ;

Diaz-Candido, Giezel ;

Maningo, James ;

Tseng, Victor .

PLOS DIGITAL HEALTH, 2023, 2 (02)

[9] Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT [J].

Lum, Zachary C. .

CLINICAL ORTHOPAEDICS AND RELATED RESEARCH, 2023, 481 (08) :1623-1630

[10] Performance of an Artificial Intelligence Chatbot in Ophthalmic Knowledge Assessment [J].

Mihalache, Andrew ;

Popovic, Marko M. ;

Muni, Rajeev H. .

JAMA OPHTHALMOLOGY, 2023, 141 (06) :589-597

← 1 2 →