Evaluating the performance of Generative Pre-trained Transformer-4 (GPT-4) in standardizing radiology reports

被引:23
作者
Hasani, Amir M. [1 ]
Singh, Shiva [2 ]
Zahergivar, Aryan [2 ]
Ryan, Beth [3 ]
Nethala, Daniel [3 ]
Bravomontenegro, Gabriela [3 ]
Mendhiratta, Neil [3 ]
Ball, Mark [3 ]
Farhadi, Faraz [2 ]
Malayeri, Ashkan [2 ]
机构
[1] NHBLI, Lab Translat Res, NIH, Bethesda, MD USA
[2] NIH, Radiol & Imaging Sci Dept, Clin Ctr, Bethesda, MD 20892 USA
[3] NCI, Urol Oncol Branch, NIH, Bethesda, MD USA
基金
美国国家卫生研究院;
关键词
Artificial intelligence; Natural language processing; Digital health; Machine learning;
D O I
10.1007/s00330-023-10384-x
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
ObjectiveRadiology reporting is an essential component of clinical diagnosis and decision-making. With the advent of advanced artificial intelligence (AI) models like GPT-4 (Generative Pre-trained Transformer 4), there is growing interest in evaluating their potential for optimizing or generating radiology reports. This study aimed to compare the quality and content of radiologist-generated and GPT-4 AI-generated radiology reports.MethodsA comparative study design was employed in the study, where a total of 100 anonymized radiology reports were randomly selected and analyzed. Each report was processed by GPT-4, resulting in the generation of a corresponding AI-generated report. Quantitative and qualitative analysis techniques were utilized to assess similarities and differences between the two sets of reports.ResultsThe AI-generated reports showed comparable quality to radiologist-generated reports in most categories. Significant differences were observed in clarity (p = 0.027), ease of understanding (p = 0.023), and structure (p = 0.050), favoring the AI-generated reports. AI-generated reports were more concise, with 34.53 fewer words and 174.22 fewer characters on average, but had greater variability in sentence length. Content similarity was high, with an average Cosine Similarity of 0.85, Sequence Matcher Similarity of 0.52, BLEU Score of 0.5008, and BERTScore F1 of 0.8775.ConclusionThe results of this proof-of-concept study suggest that GPT-4 can be a reliable tool for generating standardized radiology reports, offering potential benefits such as improved efficiency, better communication, and simplified data extraction and analysis. However, limitations and ethical implications must be addressed to ensure the safe and effective implementation of this technology in clinical practice.Clinical relevance statementThe findings of this study suggest that GPT-4 (Generative Pre-trained Transformer 4), an advanced AI model, has the potential to significantly contribute to the standardization and optimization of radiology reporting, offering improved efficiency and communication in clinical practice.Key Points center dot Large language model-generated radiology reports exhibited high content similarity and moderate structural resemblance to radiologist-generated reports.center dot Performance metrics highlighted the strong matching of word selection and order, as well as high semantic similarity between AI and radiologist-generated reports.center dot Large language model demonstrated potential for generating standardized radiology reports, improving efficiency and communication in clinical settings.Key Points center dot Large language model-generated radiology reports exhibited high content similarity and moderate structural resemblance to radiologist-generated reports.center dot Performance metrics highlighted the strong matching of word selection and order, as well as high semantic similarity between AI and radiologist-generated reports.center dot Large language model demonstrated potential for generating standardized radiology reports, improving efficiency and communication in clinical settings.Key Points center dot Large language model-generated radiology reports exhibited high content similarity and moderate structural resemblance to radiologist-generated reports.center dot Performance metrics highlighted the strong matching of word selection and order, as well as high semantic similarity between AI and radiologist-generated reports. center dot Large language model demonstrated potential for generating standardized radiology reports, improving efficiency and communication in clinical settings.
引用
收藏
页码:3566 / 3574
页数:9
相关论文
共 35 条
  • [21] Translating radiology reports into plain language using ChatGPT and GPT-4 with prompt learning: results, limitations, and potential
    Lyu, Qing
    Tan, Josh
    Zapadka, Michael E.
    Ponnatapura, Janardhana
    Niu, Chuang
    Myers, Kyle J.
    Wang, Ge
    Whitlow, Christopher T.
    VISUAL COMPUTING FOR INDUSTRY BIOMEDICINE AND ART, 2023, 6 (01)
  • [22] Translating radiology reports into plain language using ChatGPT and GPT-4 with prompt learning: results, limitations, and potential
    Qing Lyu
    Josh Tan
    Michael E. Zapadka
    Janardhana Ponnatapura
    Chuang Niu
    Kyle J. Myers
    Ge Wang
    Christopher T. Whitlow
    Visual Computing for Industry, Biomedicine, and Art, 6
  • [23] Is generative pre-trained transformer artificial intelligence (Chat-GPT) a reliable tool for guidelines synthesis? A preliminary evaluation for biologic CRSwNP therapy
    Antonino Maniaci
    Alberto Maria Saibene
    Christian Calvo-Henriquez
    Luigi Vaira
    Thomas Radulesco
    Justin Michel
    Carlos Chiesa-Estomba
    Leigh Sowerby
    David Lobo Duro
    Miguel Mayo-Yanez
    Juan Maza-Solano
    Jerome Rene Lechien
    Ignazio La Mantia
    Salvatore Cocuzza
    European Archives of Oto-Rhino-Laryngology, 2024, 281 : 2167 - 2173
  • [24] Is generative pre-trained transformer artificial intelligence (Chat-GPT) a reliable tool for guidelines synthesis? A preliminary evaluation for biologic CRSwNP therapy
    Maniaci, Antonino
    Saibene, Alberto Maria
    Calvo-Henriquez, Christian
    Vaira, Luigi
    Radulesco, Thomas
    Michel, Justin
    Chiesa-Estomba, Carlos
    Sowerby, Leigh
    Lobo Duro, David
    Mayo-Yanez, Miguel
    Maza-Solano, Juan
    Lechien, Jerome Rene
    La Mantia, Ignazio
    Cocuzza, Salvatore
    EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY, 2024, 281 (04) : 2167 - 2173
  • [25] More Than Meets the AI: Evaluating the performance of GPT-4 on Computer Graphics assessment questions
    Feng, Tony Haoran
    Denny, Paul
    Wuensche, Burkhard C.
    Luxton-Reilly, Andrew
    Hooper, Steffan
    PROCEEDINGS OF THE 26TH AUSTRALASIAN COMPUTING EDUCATION CONFERENCE, ACE 2024, 2024, : 182 - 191
  • [26] Evaluating the accuracy of Chat Generative Pre-trained Transformer version 4 (ChatGPT-4) responses to United States Food and Drug Administration (FDA) frequently asked questions about dental amalgam
    Buldur, Mehmet
    Sezer, Berkant
    BMC ORAL HEALTH, 2024, 24 (01):
  • [27] Performance of 4 Pre-Trained Sentence Transformer Models in the Semantic Query of a Systematic Review Dataset on Peri-Implantitis
    Galli, Carlo
    Donos, Nikolaos
    Calciolari, Elena
    INFORMATION, 2024, 15 (02)
  • [28] Generative pretrained transformer-4, an artificial intelligence text predictive model, has a high capability for passing novel written radiology exam questions
    Avnish Sood
    Nina Mansoor
    Caroline Memmi
    Magnus Lynch
    Jeremy Lynch
    International Journal of Computer Assisted Radiology and Surgery, 2024, 19 : 645 - 653
  • [29] Generative pretrained transformer-4, an artificial intelligence text predictive model, has a high capability for passing novel written radiology exam questions
    Sood, Avnish
    Mansoor, Nina
    Memmi, Caroline
    Lynch, Magnus
    Lynch, Jeremy
    INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2024, 19 (04) : 645 - 653
  • [30] Diagnostic performance of generative pretrained transformer-4 with vision technology versus board-certified dermatologists: A comparative analysis using dermoscopic and clinical images
    Block, Brandon R.
    Powers, Camille M.
    Chang, Annie
    Campbell, Caroline
    Piontkowski, Austin J.
    Orloff, Jeremy
    Levoska, Melissa A.
    Cices, Ahuva
    Fenner, Justine
    Talia, Jordan
    Adalsteinsson, Jonas A.
    Ungar, Jonathan
    Gulati, Nicholas
    JAAD INTERNATIONAL, 2025, 18 : 142 - 144