Artificial intelligence chatbots as sources of patient education material for obstructive sleep apnoea: ChatGPT versus Google Bard

被引:42
作者
Cheong, Ryan Chin Taw [1 ]
Unadkat, Samit [2 ]
Mcneillis, Venkata [2 ]
Williamson, Andrew [1 ]
Joseph, Jonathan [2 ]
Randhawa, Premjit [2 ]
Andrews, Peter [2 ]
Paleri, Vinidh [1 ]
机构
[1] Royal Marsden NHS Fdn Trust, Otolaryngol Head & Neck Surg Dept, Fulham Rd, London SW3 6JJ, England
[2] Univ Coll London Hosp NHS Fdn Trust, Royal Natl ENT & Eastman Dent Hosp, Otolaryngol Head & Neck Surg Dept, London, England
关键词
Artificial intelligence; Large language models; ChatGPT; Google Bard; Obstructive sleep apnoea; Patient education material;
D O I
10.1007/s00405-023-08319-9
中图分类号
R76 [耳鼻咽喉科学];
学科分类号
100213 ;
摘要
Purpose To perform the first head-to-head comparative evaluation of patient education material for obstructive sleep apnoea generated by two artificial intelligence chatbots, ChatGPT and its primary rival Google Bard.Methods Fifty frequently asked questions on obstructive sleep apnoea in English were extracted from the patient information webpages of four major sleep organizations and categorized as input prompts. ChatGPT and Google Bard responses were selected and independently rated using the Patient Education Materials Assessment Tool-Printable (PEMAT-P) Auto-Scoring Form by two otolaryngologists, with a Fellowship of the Royal College of Surgeons (FRCS) and a special interest in sleep medicine and surgery. Responses were subjectively screened for any incorrect or dangerous information as a secondary outcome. The Flesch-Kincaid Calculator was used to evaluate the readability of responses for both ChatGPT and Google Bard.Results A total of 46 questions were curated and categorized into three domains: condition (n = 14), investigation (n = 9) and treatment (n = 23). Understandability scores for ChatGPT versus Google Bard on the various domains were as follows: condition 90.86% vs.76.32% (p < 0.001); investigation 89.94% vs. 71.67% (p < 0.001); treatment 90.78% vs.73.74% (p < 0.001). Actionability scores for ChatGPT versus Google Bard on the various domains were as follows: condition 77.14% vs. 51.43% (p < 0.001); investigation 72.22% vs. 54.44% (p = 0.05); treatment 73.04% vs. 54.78% (p = 0.002). The mean Flesch-Kincaid Grade Level for ChatGPT was 9.0 and Google Bard was 5.9. No incorrect or dangerous information was identified in any of the generated responses from both ChatGPT and Google Bard.Conclusion Evaluation of ChatGPT and Google Bard patient education material for OSA indicates the former to offer superior information across several domains.
引用
收藏
页码:985 / 993
页数:9
相关论文
共 28 条
[1]  
aasm, PAT FRIENDL GUID AM
[2]  
ahrq, INTRO AGENCY HEALTHC
[3]  
Ali Stephen R, 2023, Lancet Digit Health, V5, pe179, DOI [10.1016/s2589-7500(23)00048-1, 10.1016/S2589-7500(23)00048-1]
[4]   Artificial Hallucinations in ChatGPT: Implications in Scientific Writing [J].
Alkaissi, Hussam ;
McFarlane, Samy I. .
CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (02)
[5]  
[Anonymous], NHS LONG TERM PLAN
[6]  
[Anonymous], 2023, GOOGL AI UPD BARD NE
[7]   Comparison Between ChatGPT and Google Search as Sources of Postoperative Patient Instructions [J].
Ayoub, Noel F. ;
Lee, Yu-Jin ;
Grimm, David ;
Balakrishnan, Karthik .
JAMA OTOLARYNGOLOGY-HEAD & NECK SURGERY, 2023, 149 (06) :556-+
[8]   Estimation of the global prevalence and burden of obstructive sleep apnoea: a literature-based analysis [J].
Benjafield, Adam V. ;
Ayas, Najib T. ;
Eastwood, Peter R. ;
Heinzer, Raphael ;
Ip, Mary S. M. ;
Morrell, Mary J. ;
Nunez, Carlos M. ;
Patel, Sanjay R. ;
Penzel, Thomas ;
Pepin, Jean-Louis D. ;
Peppard, Paul E. ;
Sinha, Sanjeev ;
Tufik, Sergio ;
Valentine, Kate ;
Malhotra, Atul .
LANCET RESPIRATORY MEDICINE, 2019, 7 (08) :687-698
[9]  
entuk, SNORING SLEEP APNOEA
[10]  
epic, EP MICR BRING GPT 4