Performance of Chatgpt in ophthalmology exam; human versus AI

被引:5
作者
Balci, Ali Safa [1 ,2 ]
Yazar, Zeliha [3 ]
Ozturk, Banu Turgut [4 ]
Altan, Cigdem [1 ]
机构
[1] Univ Hlth Sci, Beyoglu Eye Training & Res Hosp, Dept Ophthalmol, TR-34420 Istanbul, Turkiye
[2] Univ Hlth Sci, Sancaktepe Prof Dr Ilhan Varank Training & Res Hos, Dept Ophthalmol, Istanbul, Turkiye
[3] Univ Hlth Sci, Ankara City Hosp, Dept Ophthalmol, Ankara, Turkiye
[4] Selcuk Univ, Fac Med, Dept Ophthalmol, Konya, Turkiye
关键词
Artificial intelligence; ChatGPT; Education; Exam; Resident;
D O I
10.1007/s10792-024-03353-w
中图分类号
R77 [眼科学];
学科分类号
100212 ;
摘要
PurposeThis cross-sectional study focuses on evaluating the success rate of ChatGPT in answering questions from the 'Resident Training Development Exam' and comparing these results with the performance of the ophthalmology residents.MethodsThe 75 exam questions, across nine sections and three difficulty levels, were presented to ChatGPT. The responses and explanations were recorded. The readability and complexity of the explanations were analyzed and The Flesch Reading Ease (FRE) score (0-100) was recorded using the program named Readable. Residents were categorized into four groups based on their seniority. The overall and seniority-specific success rates of the residents were compared separately with ChatGPT.ResultsOut of 69 questions, ChatGPT answered 37 correctly (53.62%). The highest success was in Lens and Cataract (77.77%), and the lowest in Pediatric Ophthalmology and Strabismus (0.00%). Of 789 residents, overall accuracy was 50.37%. Seniority-specific accuracy rates were 43.49%, 51.30%, 54.91%, and 60.05% for 1st to 4th-year residents. ChatGPT ranked 292nd among residents. Difficulty-wise, 11 questions were easy, 44 moderate, and 14 difficult. ChatGPT's accuracy for each level was 63.63%, 54.54%, and 42.85%, respectively. The average FRE score of responses generated by ChatGPT was found to be 27.56 +/- 12.40.ConclusionChatGPT correctly answered 53.6% of questions in an exam for residents. ChatGPT has a lower success rate on average than a 3rd year resident. The readability of responses provided by ChatGPT is low, and they are difficult to understand. As difficulty increases, ChatGPT's success decreases. Predictably, these results will change with more information loaded into ChatGPT.
引用
收藏
页数:8
相关论文
共 24 条
[1]   ChatGPT and scientific abstract writing: pitfalls and caution [J].
Ali, Mohammad Javed ;
Singh, Swati .
GRAEFES ARCHIVE FOR CLINICAL AND EXPERIMENTAL OPHTHALMOLOGY, 2023, 261 (11) :3205-3206
[2]   ChatGPT and Lacrimal Drainage Disorders: Performance and Scope of Improvement [J].
Ali, Mohammad Javed .
OPHTHALMIC PLASTIC AND RECONSTRUCTIVE SURGERY, 2023, 39 (03) :221-225
[3]   Artificial Hallucinations in ChatGPT: Implications in Scientific Writing [J].
Alkaissi, Hussam ;
McFarlane, Samy I. .
CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (02)
[4]  
American board of ophthalmology, 2023, Examination overview-ABO WQE Procedures Manual-1. ABO WQE Procedures Manual
[5]   Evaluating the Performance of ChatGPT in Ophthalmology [J].
Antaki, Fares ;
Touma, Samir ;
Milad, Daniel ;
El -Khoury, Jonathan ;
Duval, Renaud .
OPHTHALMOLOGY SCIENCE, 2023, 3 (04)
[6]   Auxiliary use of ChatGPT in surgical diagnosis and treatment [J].
Au, Kahei ;
Yang, Wah .
INTERNATIONAL JOURNAL OF SURGERY, 2023, 109 (12) :3940-3943
[7]   Performance of Generative Large Language Models on Ophthalmology Board-Style Questions [J].
Cai, Louis Z. ;
Shaheen, Abdulla ;
Jin, Andrew ;
Fukui, Riya ;
Yi, Jonathan S. ;
Yannuzzi, Nicolas ;
Alabiad, Chrisfouad .
AMERICAN JOURNAL OF OPHTHALMOLOGY, 2023, 254 :141-149
[8]   How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment [J].
Gilson, Aidan ;
Safranek, Conrad W. ;
Huang, Thomas ;
Socrates, Vimig ;
Chi, Ling ;
Taylor, Richard Andrew ;
Chartash, David .
JMIR MEDICAL EDUCATION, 2023, 9
[9]   Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models [J].
Kung, Tiffany H. ;
Cheatham, Morgan ;
Medenilla, Arielle ;
Sillos, Czarina ;
De Leon, Lorie ;
Elepano, Camille ;
Madriaga, Maria ;
Aggabao, Rimel ;
Diaz-Candido, Giezel ;
Maningo, James ;
Tseng, Victor .
PLOS DIGITAL HEALTH, 2023, 2 (02)
[10]   Why and how to embrace AI such as ChatGPT in your academic life [J].
Lin, Zhicheng .
ROYAL SOCIETY OPEN SCIENCE, 2023, 10 (08)