Generative Pre-trained Transformer 4 makes cardiovascular magnetic resonance reports easy to understand

被引:13
|
作者
Salam, Babak [1 ,2 ]
Kravchenko, Dmitrij [1 ,2 ]
Nowak, Sebastian [1 ,2 ]
Sprinkart, Alois M. [1 ,2 ]
Weinhold, Leonie [3 ]
Odenthal, Anna [1 ]
Mesropyan, Narine [1 ,2 ]
Bischoff, Leon M. [1 ,2 ]
Attenberger, Ulrike [1 ]
Kuetting, Daniel L. [1 ,2 ]
Luetkens, Julian A. [1 ,2 ]
Isaak, Alexander [1 ,2 ]
机构
[1] Univ Hosp Bonn, Dept Diagnost & Intervent Radiol, Venusberg Campus 1, D-53127 Bonn, Germany
[2] Univ Hosp Bonn, Quant Imaging Lab Bonn QILaB, Venusberg Campus 1, D-53127 Bonn, Germany
[3] Univ Hosp Bonn, Dept Med Biometry Informat & Epidemiol, Venusberg Campus 1, D-53127 Bonn, Germany
关键词
Generative Pre-trained Transformers; Cardiovascular magnetic resonance; Artificial intelligence; Text simplification; Large language models; RADIOLOGY REPORTS; READABILITY;
D O I
10.1016/j.jocmr.2024.101035
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Background: Patients are increasingly using Generative Pre-trained Transformer 4 (GPT-4) to better understand their own radiology findings. Purpose: To evaluate the performance of GPT-4 in transforming cardiovascular magnetic resonance (CMR) reports into text that is comprehensible to medical laypersons. Methods: ChatGPT with GPT-4 architecture was used to generate three different explained versions of 20 various CMR reports (n = 60) using the same prompt: "Explain the radiology report in a language understandable to a medical layperson". Two cardiovascular radiologists evaluated understandability, factual correctness, completeness of relevant findings, and lack of potential harm, while 13 medical laypersons evaluated the understandability of the original and the GPT-4 reports on a Likert scale (1 "strongly disagree", 5 "strongly agree"). Readability was measured using the Automated Readability Index (ARI). Linear mixed-effects models (values given as median [interquartile range]) and intraclass correlation coefficient (ICC) were used for statistical analysis. Results: GPT-4 reports were generated on average in 52 s +/- 13. GPT-4 reports achieved a lower ARI score (10 [9-12] vs 5 [4-6]; p < 0.001) and were subjectively easier to understand for laypersons than original reports (1 [1] vs 4 [4,5]; p < 0.001). Eighteen out of 20 (90%) standard CMR reports and 2/60 (3%) GPT-generated reports had an ARI score corresponding to the 8th grade level or higher. Radiologists' ratings of the GPT-4 reports reached high levels for correctness (5 [4, 5]), completeness (5 [5]), and lack of potential harm (5 [5]); with "strong agreement" for factual correctness in 94% (113/120) and completeness of relevant findings in 81% (97/120) of reports. Test-retest agreement for layperson understandability ratings between the three simplified reports generated from the same original report was substantial (ICC: 0.62; p < 0.001). Interrater agreement between radiologists was almost perfect for lack of potential harm (ICC: 0.93, p < 0.001) and moderate to substantial for completeness (ICC: 0.76, p < 0.001) and factual correctness (ICC: 0.55, p < 0.001). Conclusion: GPT-4 can reliably transform complex CMR reports into more understandable, layperson-friendly language while largely maintaining factual correctness and completeness, and can thus help convey patientrelevant radiology information in an easy-to-understand manner.
引用
收藏
页数:8
相关论文
共 37 条
  • [31] Is generative pre-trained transformer artificial intelligence (Chat-GPT) a reliable tool for guidelines synthesis? A preliminary evaluation for biologic CRSwNP therapy
    Antonino Maniaci
    Alberto Maria Saibene
    Christian Calvo-Henriquez
    Luigi Vaira
    Thomas Radulesco
    Justin Michel
    Carlos Chiesa-Estomba
    Leigh Sowerby
    David Lobo Duro
    Miguel Mayo-Yanez
    Juan Maza-Solano
    Jerome Rene Lechien
    Ignazio La Mantia
    Salvatore Cocuzza
    European Archives of Oto-Rhino-Laryngology, 2024, 281 : 2167 - 2173
  • [32] Evaluating the accuracy of Chat Generative Pre-trained Transformer version 4 (ChatGPT-4) responses to United States Food and Drug Administration (FDA) frequently asked questions about dental amalgam
    Buldur, Mehmet
    Sezer, Berkant
    BMC ORAL HEALTH, 2024, 24 (01):
  • [33] Generative pre-trained transformer 4o (GPT-4o) in solving text-based multiple response questions for European Diploma in Radiology (EDiR): a comparative study with radiologists
    Jakub Pristoupil
    Laura Oleaga
    Vanesa Junquero
    Cristina Merino
    Ozbek Suha Sureyya
    Martin Kyncl
    Andrea Burgetova
    Lukas Lambert
    Insights into Imaging, 16 (1)
  • [34] Artificial intelligence in orthopaedics: can Chat Generative Pre-trained Transformer (ChatGPT) pass Section 1 of the Fellowship of the Royal College of Surgeons (Trauma & Orthopaedics) examination?
    Cuthbert, Rory
    Simpson, Ashley, I
    POSTGRADUATE MEDICAL JOURNAL, 2023, 99 (1176) : 1110 - 1114
  • [35] Emotion-aware psychological first aid: Integrating BERT-based emotional distress detection with Psychological First Aid-Generative Pre-Trained Transformer chatbot for mental health support
    Taiwo, Olajumoke
    Al-Bander, Baidaa
    COGNITIVE COMPUTATION AND SYSTEMS, 2025, 7 (01)
  • [36] Development and evaluation of a program based on a generative pre-trained transformer model from a public natural language processing platform for efficiency enhancement in post-procedural quality control of esophageal endoscopic submucosal dissection
    Ma, Huaiyuan
    Ma, Xingbin
    Yang, Chunxiao
    Niu, Qiong
    Gao, Tao
    Liu, Chengxia
    Chen, Yan
    SURGICAL ENDOSCOPY AND OTHER INTERVENTIONAL TECHNIQUES, 2024, 38 (03): : 1264 - 1272
  • [37] Development and evaluation of a program based on a generative pre-trained transformer model from a public natural language processing platform for efficiency enhancement in post-procedural quality control of esophageal endoscopic submucosal dissection
    Huaiyuan Ma
    Xingbin Ma
    Chunxiao Yang
    Qiong Niu
    Tao Gao
    Chengxia Liu
    Yan Chen
    Surgical Endoscopy, 2024, 38 : 1264 - 1272