PERFORMANCE ASSESSMENT OF AN ARTIFICIAL INTELLIGENCE CHATBOT IN CLINICAL VITREORETINAL SCENARIOS

被引：6

作者：

Maywood, Michael J. ^{[1
]}

Parikh, Ravi ^{[2
,3
]}

Deobhakta, Avnish ^{[4
]}

Begaj, Tedi ^{[1
,5
]}

机构：

[1] Corewell Hlth William Beaumont Univ Hosp, Dept Ophthalmol, Royal Oak, MI USA

[2] Manhattan Retina & Eye Consultants, New York, NY USA

[3] NYU, Sch Med, Dept Ophthalmol, New York, NY USA

[4] Icahn Sch Med Mt Sinai, New York, NY USA

[5] Associated Retinal Consultants, 3555 West Thirteen Mile Rd,Suite LL-20, Royal Oak, MI 48073 USA

来源：

RETINA-THE JOURNAL OF RETINAL AND VITREOUS DISEASES | 2024年 / 44卷 / 06期

关键词：

artificial intelligence; chatbot; ChatGPT; ophthalmology; retina;

D O I：

10.1097/IAE.0000000000004053

中图分类号：

R77 [眼科学];

学科分类号：

100212 ;

摘要：

Supplemental Digital Content is Available in the Text.In this retrospective cross-sectional study, ChatGPT answered 83% of clinical vitreoretinal scenarios correctly and 52.5% of scenarios comprehensively. ChatGPT is a powerful source of information but must be used with caution given its limitations in providing comprehensive clinical management decisions. Purpose:To determine how often ChatGPT is able to provide accurate and comprehensive information regarding clinical vitreoretinal scenarios. To assess the types of sources ChatGPT primarily uses and to determine whether they are hallucinated.Methods:This was a retrospective cross-sectional study. The authors designed 40 open-ended clinical scenarios across four main topics in vitreoretinal disease. Responses were graded on correctness and comprehensiveness by three blinded retina specialists. The primary outcome was the number of clinical scenarios that ChatGPT answered correctly and comprehensively. Secondary outcomes included theoretical harm to patients, the distribution of the type of references used by the chatbot, and the frequency of hallucinated references.Results:In June 2023, ChatGPT answered 83% of clinical scenarios (33/40) correctly but provided a comprehensive answer in only 52.5% of cases (21/40). Subgroup analysis demonstrated an average correct score of 86.7% in neovascular age-related macular degeneration, 100% in diabetic retinopathy, 76.7% in retinal vascular disease, and 70% in the surgical domain. There were six incorrect responses with one case (16.7%) of no harm, three cases (50%) of possible harm, and two cases (33.3%) of definitive harm.Conclusion:ChatGPT correctly answered more than 80% of complex open-ended vitreoretinal clinical scenarios, with a reduced capability to provide a comprehensive response.

引用

页码：954 / 964

页数：11

共 16 条

[1]

2023 PAT Survey, 2023, American Society of Retinal Specialists

[2] Evaluating the Performance of ChatGPT in Ophthalmology [J].

Antaki, Fares ;

Touma, Samir ;

Milad, Daniel ;

El -Khoury, Jonathan ;

Duval, Renaud .

OPHTHALMOLOGY SCIENCE, 2023, 3 (04)

[3]

Bogost I., 2023, ChatGPT Is Dumber Than You Think

[4] Performance of Generative Large Language Models on Ophthalmology Board-Style Questions [J].

Cai, Louis Z. ;

Shaheen, Abdulla ;

Jin, Andrew ;

Fukui, Riya ;

Yi, Jonathan S. ;

Yannuzzi, Nicolas ;

Alabiad, Chrisfouad .

AMERICAN JOURNAL OF OPHTHALMOLOGY, 2023, 254 :141-149

[5] Accuracy of Vitreoretinal Disease Information From an Artificial Intelligence Chatbot [J].

Caranfa, Jonathan T. ;

Bommakanti, Nikhil K. ;

Young, Benjamin K. ;

Zhao, Peter Y. .

JAMA OPHTHALMOLOGY, 2023, 141 (09) :906-907

[6] Retinal pigment epithelium tears: Classification, pathogenesis, predictors, and management [J].

Ersoz, Mehmet Giray ;

Karacorlu, Murat ;

Arf, Serra ;

Muslubas, Isil Sayman ;

Hocaoglu, Mumin .

SURVEY OF OPHTHALMOLOGY, 2017, 62 (04) :493-505

[7] Autologous Retinal Transplant for Refractory Macular Holes: Multicenter International Collaborative Study Group [J].

Grewal, Dilraj S. ;

Charles, Steve ;

Parolini, Barbara ;

Kadonosono, Kazuaki ;

Mahmoud, Tamer H. .

OPHTHALMOLOGY, 2019, 126 (10) :1399-1408

[8] Evaluation and Comparison of Ophthalmic Scientific Abstracts and References by Current Artificial Intelligence Chatbots [J].

Hua, Hong-Uyen ;

Kaakour, Abdul-Hadi ;

Rachitskaya, Aleksandra ;

Srivastava, Sunil ;

Sharma, Sumit ;

Mammo, Danny A. .

JAMA OPHTHALMOLOGY, 2023, 141 (09) :819-824

[9] MEASUREMENT OF OBSERVER AGREEMENT FOR CATEGORICAL DATA [J].

LANDIS, JR ;

KOCH, GG .

BIOMETRICS, 1977, 33 (01) :159-174

[10]

Marion S., 2023, How to Use OpenAI Model Temperature? GPT for Work

← 1 2 →