The performance of ChatGPT-4 and Bing Chat in frequently asked questions about glaucoma

被引:3
作者
Dogan, Levent [1 ]
Yilmaz, Ibrahim Edhem [1 ]
机构
[1] Kilis State Hosp, Dept Ophthalmol, TR-79000 Kilis, Turkiye
关键词
Artificial intelligence; ChatGPT; Bing Chat; readability tests; glaucoma; frequently asked questions;
D O I
10.1177/11206721251321197
中图分类号
R77 [眼科学];
学科分类号
100212 ;
摘要
Purpose To evaluate the appropriateness and readability of the responses generated by ChatGPT-4 and Bing Chat to frequently asked questions about glaucoma. Method Thirty-four questions were generated for this study. Each question was directed three times to a fresh ChatGPT-4 and Bing Chat interface. The obtained responses were categorised by two glaucoma specialists in terms of their appropriateness. Accuracy of the responses was evaluated using the Structure of the Observed Learning Outcome (SOLO) taxonomy. Readability of the responses was assessed using Flesch Reading Ease (FRE), Flesch Kincaid Grade Level (FKGL), Coleman-Liau Index (CLI), Simple Measure of Gobbledygook (SMOG), and Gunning- Fog Index (GFI). Results The percentage of appropriate responses was 88.2% (30/34) and 79.2% (27/34) in ChatGPT-4 and Bing Chat, respectively. Both the ChatGPT-4 and Bing Chat interfaces provided at least one inappropriate response to 1 of the 34 questions. The SOLO test results for ChatGPT-3.5 and Bing Chat were 3.86 +/- 0.41 and 3.70 +/- 0.52, respectively. No statistically significant difference in performance was observed between both LLMs (p = 0.101). The mean count of words used when generating responses was 316.5 (+/- 85.1) and 61.6 (+/- 25.8) in ChatGPT-4 and Bing Chat, respectively (p < 0.05). According to FRE scores, the generated responses were suitable for only 4.5% and 33% of U.S. adults in ChatGPT-4 and Bing Chat, respectively (p < 0.05). Conclusions ChatGPT-4 and Bing Chat consistently provided appropriate responses to the questions. Both LLMs had low readability scores, but ChatGPT-4 provided more difficult responses in terms of readability.
引用
收藏
页码:1323 / 1328
页数:6
相关论文
共 24 条
[1]  
[Anonymous], 2013, HLTH ONLINE
[2]   Public Health Communication in Time of Crisis: Readability of On-Line COVID-19 Information [J].
Basch, Corey H. ;
Mohlman, Jan ;
Hillyer, Grace C. ;
Garcia, Philip .
DISASTER MEDICINE AND PUBLIC HEALTH PREPAREDNESS, 2020, 14 (05) :635-637
[3]  
Biggs J., 1982, ORIGIN DESCRIPTION S
[4]  
Deng JY, 2023, Frontiers in Computing and Intelligent Systems, V2, P81, DOI [10.54097/fcis.v2i2.4465, 10.54097/fcis.v2i2.4465, DOI 10.54097/FCIS.V2I2.4465]
[5]  
Doan L., 2025, J FRANAIS DOPHTALMOL, V48, P104381
[6]  
Doan L., 2024, J PEDIAT OPHTHALMOL
[7]  
DuBay W., 2004, PRINCIPLES READABILI
[8]   Using pharmacy claims data to study adherence to glaucoma medications: Methodology and findings of the Glaucoma Adherence and Persistency Study (GAPS) [J].
Friedman, David S. ;
Quigley, Harry A. ;
Gelb, Laurie ;
Tan, Jason ;
Margolis, Jay ;
Shah, Sonali N. ;
Kim, Elizabeth E. ;
Zimmerman, Thom ;
Hahn, Steven R. .
INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2007, 48 (11) :5052-5057
[9]   Computerized versus hand-scored health literacy tools: a comparison of Simple Measure of Gobbledygook (SMOG) and Flesch-Kincaid in printed patient education materials [J].
Grabeel, Kelsey Leonard ;
Russomanno, Jennifer ;
Oelschlegel, Sandy ;
Tester, Emily ;
Heidel, Robert Eric .
JOURNAL OF THE MEDICAL LIBRARY ASSOCIATION, 2018, 106 (01) :38-45
[10]   Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models [J].
Kung, Tiffany H. ;
Cheatham, Morgan ;
Medenilla, Arielle ;
Sillos, Czarina ;
De Leon, Lorie ;
Elepano, Camille ;
Madriaga, Maria ;
Aggabao, Rimel ;
Diaz-Candido, Giezel ;
Maningo, James ;
Tseng, Victor .
PLOS DIGITAL HEALTH, 2023, 2 (02)