Evaluation of an Artificial Intelligence Chatbot for Delivery of IR Patient Education Material: A Comparison with Societal Website Content

被引:35
作者
McCarthy, Colin J. [1 ]
Berkowitz, Seth [1 ]
Ramalingam, Vijay [1 ]
Ahmed, Muneeb [1 ]
机构
[1] Harvard Med Sch, Beth Israel Deaconess Med Ctr, Div Vasc & Intervent Radiol, Rosenberg 3,One Deaconess Rd, Boston, MA 02215 USA
关键词
READABILITY FORMULA; MANAGEMENT;
D O I
10.1016/j.jvir.2023.05.037
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Purpose: To assess the accuracy, completeness, and readability of patient educational material produced by a machine learning model and compare the output to that provided by a societal website. Materials and Methods: Content from the Society of Interventional Radiology Patient Center website was retrieved, categorized, and organized into discrete questions. These questions were entered into the ChatGPT platform, and the output was analyzed for word and sentence counts, readability using multiple validated scales, factual correctness, and suitability for patient education using the Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P) instrument. Results: A total of 21,154 words were analyzed, including 7,917 words from the website and 13,377 words representing the total output of the ChatGPT platform across 22 text passages. Compared to the societal website, output from the ChatGPT platform was longer and more difficult to read on 4 of 5 readability scales. The ChatGPT output was incorrect for 12 (11.5%) of 104 questions. When reviewed using the PEMAT-P tool, the ChatGPT content scored lower than the website material. Content from both the website and ChatGPT were significantly above the recommended fifth or sixth grade level for patient education, with a mean Flesch-Kincaid grade level of 11.1 (+/- 1.3) for the website and 11.9 (+/- 1.6) for the ChatGPT content. Conclusions: The ChatGPT platform may produce incomplete or inaccurate patient educational content, and providers should be familiar with the limitations of the system in its current form. Opportunities may exist to fine-tune existing large language models, which could be optimized for the delivery of patient educational content.
引用
收藏
页码:1760 / +
页数:41
相关论文
共 37 条
  • [1] [Anonymous], 1995, Readability revisited: The new Dale-Chall readability formula
  • [2] [Anonymous], Advancing effective communication, cultural competence, and patient and family centered care: A roadmap for hospitals
  • [3] RESEARCH ON SCIENTIFIC JOURNALS - IMPLICATIONS FOR EDITORS AND AUTHORS
    ARMSTRONG, JS
    [J]. JOURNAL OF FORECASTING, 1982, 1 (01) : 83 - 104
  • [4] Biron B, Online mental health company uses ChatGPT to help respond to users in experiment-raising ethical concerns around healthcare and AI technology
  • [5] Spermatic vein embolization as a treatment for symptomatic varicocele
    Broe, Mark P.
    Ryan, James P. C.
    Ryan, Eanna J.
    Murphy, David J.
    Mulvin, David W.
    Cantwell, Colin
    Brophy, David P.
    [J]. CUAJ-CANADIAN UROLOGICAL ASSOCIATION JOURNAL, 2021, 15 (11): : E569 - E573
  • [6] ChatGPT, About us
  • [7] COMPUTER READABILITY FORMULA DESIGNED FOR MACHINE SCORING
    COLEMAN, M
    LIAU, TL
    [J]. JOURNAL OF APPLIED PSYCHOLOGY, 1975, 60 (02) : 283 - 284
  • [8] Davis TC, 2004, FAM MED, V36, P595
  • [9] Designing Anthropomorphic Enterprise Conversational Agents
    Diederich, Stephan
    Brendel, Alfred Benedikt
    Kolbe, Lutz M.
    [J]. BUSINESS & INFORMATION SYSTEMS ENGINEERING, 2020, 62 (03) : 193 - 209
  • [10] EAU, AUA and NICE Guidelines on Surgical and Minimally Invasive Treatment of Benign Prostate Hyperplasia: A Critical Appraisal of the Guidelines Using the AGREE-II Tool
    Enikeev, Dmitry
    Misrai, Vincent
    Rijo, Enrique
    Sukhanov, Roman
    Chinenov, Denis
    Gazimiev, Magomed
    Taratkin, Mark
    Azilgareeva, Camilla
    Morozov, Andrey
    Herrmann, Thomas R. W.
    Glybochko, Petr
    [J]. UROLOGIA INTERNATIONALIS, 2022, 106 (01) : 1 - 10