Generative artificial intelligence chatbots may provide appropriate informational responses to common vascular surgery questions by patients

被引:6
作者
Chervonski, Ethan [1 ]
Harish, Keerthi B. [1 ]
Rockman, Caron B. [2 ]
Sadek, Mikel [2 ]
Teter, Katherine A. [2 ]
Jacobowitz, Glenn R. [2 ]
Berland, Todd L. [2 ]
Lohr, Joann [3 ]
Moore, Colleen [4 ]
Maldonado, Thomas S. [2 ,5 ]
机构
[1] New York Univ, Grossman Sch Med, New York, NY USA
[2] New York Univ, Dept Surg, Div Vasc & Endovascular Surg, Langone Hlth, New York, NY USA
[3] Dorn Vet Affairs Med Ctr, Columbia, SC USA
[4] InVein Clin, Cape Girardeau, MO USA
[5] New York Univ, Dept Surg, Div Vasc & Endovascular Surg, Langone Med Ctr, 530 First Ave,Sixth Floor, New York, NY 10016 USA
关键词
Vascular surgery; artificial intelligence; ChatGPT; google bard; patient education; readability; EDUCATIONAL-ATTAINMENT; ARTERY-DISEASE; READABILITY; QUALITY;
D O I
10.1177/17085381241240550
中图分类号
R6 [外科学];
学科分类号
1002 ; 100210 ;
摘要
Objectives: Generative artificial intelligence (AI) has emerged as a promising tool to engage with patients. The objective of this study was to assess the quality of AI responses to common patient questions regarding vascular surgery disease processes. Methods: OpenAI's ChatGPT-3.5 and Google Bard were queried with 24 mock patient questions spanning seven vascular surgery disease domains. Six experienced vascular surgery faculty at a tertiary academic center independently graded AI responses on their accuracy (rated 1-4 from completely inaccurate to completely accurate), completeness (rated 1-4 from totally incomplete to totally complete), and appropriateness (binary). Responses were also evaluated with three readability scales. Results: ChatGPT responses were rated, on average, more accurate than Bard responses (3.08 +/- 0.33 vs 2.82 +/- 0.40, p < .01). ChatGPT responses were scored, on average, more complete than Bard responses (2.98 +/- 0.34 vs 2.62 +/- 0.36, p < .01). Most ChatGPT responses (75.0%, n = 18) and almost half of Bard responses (45.8%, n = 11) were unanimously deemed appropriate. Almost one-third of Bard responses (29.2%, n = 7) were deemed inappropriate by at least two reviewers (29.2%), and two Bard responses (8.4%) were considered inappropriate by the majority. The mean Flesch Reading Ease, Flesch-Kincaid Grade Level, and Gunning Fog Index of ChatGPT responses were 29.4 +/- 10.8, 14.5 +/- 2.2, and 17.7 +/- 3.1, respectively, indicating that responses were readable with a post-secondary education. Bard's mean readability scores were 58.9 +/- 10.5, 8.2 +/- 1.7, and 11.0 +/- 2.0, respectively, indicating that responses were readable with a high-school education (p < .0001 for three metrics). ChatGPT's mean response length (332 +/- 79 words) was higher than Bard's mean response length (183 +/- 53 words, p < .001). There was no difference in the accuracy, completeness, readability, or response length of ChatGPT or Bard between disease domains (p > .05 for all analyses). Conclusions: AI offers a novel means of educating patients that avoids the inundation of information from "Dr Google" and the time barriers of physician-patient encounters. ChatGPT provides largely valid, though imperfect, responses to myriad patient questions at the expense of readability. While Bard responses are more readable and concise, their quality is poorer. Further research is warranted to better understand failure points for large language models in vascular surgery patient education.
引用
收藏
页码:229 / 237
页数:9
相关论文
共 32 条
[1]   Society for Vascular Surgery clinical practice guidelines for management of extracranial cerebrovascular disease [J].
AbuRahma, Ali F. ;
Avgerinos, Efthymios D. ;
Chang, Robert W. ;
Darling, R. Clement, III ;
Duncan, Audra A. ;
Forbes, Thomas L. ;
Malas, Mahmoud B. ;
Murad, Mohammad Hassan ;
Perler, Bruce Alan ;
Powell, Richard J. ;
Rockman, Caron B. ;
Zhou, Wei .
JOURNAL OF VASCULAR SURGERY, 2022, 75 (01) :4S-22S
[2]  
Anderson PF., 2001, CONSUMER HLTH WEB SI
[3]  
Athavale Anand, 2023, JVS Vasc Insights, V1, DOI 10.1016/j.jvsvi.2023.100019
[4]   Quality and readability of online patient information for abdominal aortic aneurysms [J].
Bailey, Marc A. ;
Coughlin, Patrick A. ;
Sohrabi, Soroush ;
Griffin, Kathryn J. ;
Rashid, S. Tawqeer ;
Troxler, Max A. ;
Scott, D. Julian A. .
JOURNAL OF VASCULAR SURGERY, 2012, 56 (01) :21-26
[5]   Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions [J].
Bernstein, Isaac A. ;
Zhang, Youchen ;
Govil, Devendra ;
Majid, Iyad ;
Chang, Robert T. ;
Sun, Yang ;
Shue, Ann ;
Chou, Jonathan C. ;
Schehlein, Emily ;
Christopher, Karen L. ;
Groth, Sylvia L. ;
Ludwig, Cassie ;
Wang, Sophia Y. .
JAMA NETWORK OPEN, 2023, 6 (08)
[6]  
Centers for Disease Control and Prevention (U.S), 2010, OFFICE ASS DIRECTOR
[7]   Generative AI in Health Care and Liability Risks for Physicians and Safety Concerns for Patients [J].
Duffourc, Mindy ;
Gerke, Sara .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2023, 330 (04) :313-314
[8]   How appropriate are answers of online chat-based artificial intelligence (ChatGPT) to common questions on colon cancer? [J].
Emile, Sameh Hany ;
Horesh, Nir ;
Freund, Michael ;
Pellino, Gianluca ;
Oliveira, Lucia ;
Wignakumar, Anjelli ;
Wexner, Steven D. .
SURGERY, 2023, 174 (05) :1273-1275
[9]   Health literacy and coronary artery disease: A systematic review [J].
Ghisi, Gabriela Lima de Melo ;
da Silva Chaves, Gabriela Suellen ;
Britto, Raquel Rodrigues ;
Oh, Paul .
PATIENT EDUCATION AND COUNSELING, 2018, 101 (02) :177-184
[10]   Quality of vascular surgery Web sites on the Internet [J].
Grewal, Perbinder ;
Williams, Bryn ;
Alagaratnam, Swethan ;
Neffendorf, James ;
Soobrah, Ritish .
JOURNAL OF VASCULAR SURGERY, 2012, 56 (05) :1461-1467