Performance of an Artificial Intelligence Chatbot in Ophthalmic Knowledge Assessment

被引：186

作者：

Mihalache, Andrew ^{[1
]}

Popovic, Marko M. ^{[2
]}

Muni, Rajeev H. ^{[2
,3
]}

机构：

[1] Univ Western Ontario, Schulich Sch Med & Dent, London, ON, Canada

[2] Univ Toronto, Dept Ophthalmol & Vis Sci, Toronto, ON, Canada

[3] St Michaels Hosp Unity Hlth Toronto, Dept Ophthalmol, 30 Bond St,Donnelly Wing,8th Floor, Toronto, ON M5B 1W8, Canada

来源：

JAMA OPHTHALMOLOGY | 2023年 / 141卷 / 06期

关键词：

TABLES;

D O I：

10.1001/jamaophthalmol.2023.1144

中图分类号：

R77 [眼科学];

学科分类号：

100212 ;

摘要：

Importance ChatGPT is an artificial intelligence (AI) chatbot that has significant societal implications. Training curricula using AI are being developed in medicine, and the performance of chatbots in ophthalmology has not been characterized.Objective To assess the performance of ChatGPT in answering practice questions for board certification in ophthalmology.Design, Setting, and ParticipantsThis cross-sectional study used a consecutive sample of text-based multiple-choice questions provided by the OphthoQuestions practice question bank for board certification examination preparation. Of 166 available multiple-choice questions, 125 (75%) were text-based.Exposures ChatGPT answered questions from January 9 to 16, 2023, and on February 17, 2023.Main Outcomes and Measures Our primary outcome was the number of board certification examination practice questions that ChatGPT answered correctly. Our secondary outcomes were the proportion of questions for which ChatGPT provided additional explanations, the mean length of questions and responses provided by ChatGPT, the performance of ChatGPT in answering questions without multiple-choice options, and changes in performance over time.Results In January 2023, ChatGPT correctly answered 58 of 125 questions (46%). ChatGPT's performance was the best in the category general medicine (11/14; 79%) and poorest in retina and vitreous (0%). The proportion of questions for which ChatGPT provided additional explanations was similar between questions answered correctly and incorrectly (difference, 5.82%; 95% CI, -11.0% to 22.0%; ?(2)(1) = 0.45; P = .51). The mean length of questions was similar between questions answered correctly and incorrectly (difference, 21.4 characters; SE, 36.8; 95% CI, -51.4 to 94.3; t = 0.58; df = 123; P = .22). The mean length of responses was similar between questions answered correctly and incorrectly (difference, -80.0 characters; SE, 65.4; 95% CI, -209.5 to 49.5; t = -1.22; df = 123; P = .22). ChatGPT selected the same multiple-choice response as the most common answer provided by ophthalmology trainees on OphthoQuestions 44% of the time. In February 2023, ChatGPT provided a correct response to 73 of 125 multiple-choice questions (58%) and 42 of 78 stand-alone questions (54%) without multiple-choice options.Conclusions and Relevance ChatGPT answered approximately half of questions correctly in the OphthoQuestions free trial for ophthalmic board certification preparation. Medical professionals and trainees should appreciate the advances of AI in medicine while acknowledging that ChatGPT as used in this investigation did not answer sufficient multiple-choice questions correctly for it to provide substantial assistance in preparing for board certification at this time.

引用

页码：589 / 597

页数：9

共 15 条

[1]

Altman D, 2000, Statistics with Confidence, V2nd

[2]

Altman DG., 1991, Practical statistics for medical research, P229, DOI DOI 10.1201/9780429258589

[3]

Aydin Omer, 2022, Emerging Computer Technologies, V2, P22, DOI DOI 10.2139/SSRN.4308687

[4]

Azaria A., 2022, HAL03913837, DOI [10.31219/osf.io/5ue7n, DOI 10.31219/OSF.IO/5UE7N]

[5]

Campbell I, 2007, STAT MED, V26, P3661, DOI 10.1002/sim.2832

[6]

Castelvecchi Davide, 2022, Nature, DOI 10.1038/d41586-022-04383-z

[7]

Gilson A, 2022, MEDRXIV, DOI [10.1101/2022.12.23.22283901, DOI 10.1101/2022.12.23.22283901]

[8]

Gozalo-Brizuela R, 2023, ARXIV, DOI DOI 10.48550/ARXIV.2301.04655

[9]

Jeblick K., 2022, ARXIV, DOI DOI 10.48550/ARXIV.2212.14882

[10]

Kirkwood B., 2003, Essential medical statistics

← 1 2 →