An open-source fine-tuned large language model for radiological impression generation: a multi-reader performance study

被引:1
|
作者
Serapio, Adrian [1 ]
Chaudhari, Gunvant [3 ]
Savage, Cody [2 ]
Lee, Yoo Jin [1 ]
Vella, Maya [1 ]
Sridhar, Shravan [1 ]
Schroeder, Jamie Lee [4 ]
Liu, Jonathan [1 ]
Yala, Adam [5 ,6 ]
Sohn, Jae Ho [1 ]
机构
[1] Univ Calif San Francisco, Dept Radiol & Biomed Imaging, San Francisco, CA 94143 USA
[2] Univ Maryland, Med Ctr, Dept Radiol, Baltimore, MD USA
[3] Univ Washington, Dept Radiol, Seattle, WA USA
[4] MedStar Georgetown Univ Hosp, Washington, DC USA
[5] Univ Calif Berkeley, Computat Precis Hlth, Berkeley, CA USA
[6] Univ Calif San Francisco, San Francisco, CA USA
来源
BMC MEDICAL IMAGING | 2024年 / 24卷 / 01期
关键词
Natural language processing; Large language model; Open-source; Summarization; Impressions;
D O I
10.1186/s12880-024-01435-w
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
BackgroundThe impression section integrates key findings of a radiology report but can be subjective and variable. We sought to fine-tune and evaluate an open-source Large Language Model (LLM) in automatically generating impressions from the remainder of a radiology report across different imaging modalities and hospitals.MethodsIn this institutional review board-approved retrospective study, we collated a dataset of CT, US, and MRI radiology reports from the University of California San Francisco Medical Center (UCSFMC) (n = 372,716) and the Zuckerberg San Francisco General (ZSFG) Hospital and Trauma Center (n = 60,049), both under a single institution. The Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score, an automatic natural language evaluation metric that measures word overlap, was used for automatic natural language evaluation. A reader study with five cardiothoracic radiologists was performed to more strictly evaluate the model's performance on a specific modality (CT chest exams) with a radiologist subspecialist baseline. We stratified the results of the reader performance study based on the diagnosis category and the original impression length to gauge case complexity.ResultsThe LLM achieved ROUGE-L scores of 46.51, 44.2, and 50.96 on UCSFMC and upon external validation, ROUGE-L scores of 40.74, 37.89, and 24.61 on ZSFG across the CT, US, and MRI modalities respectively, implying a substantial degree of overlap between the model-generated impressions and impressions written by the subspecialist attending radiologists, but with a degree of degradation upon external validation. In our reader study, the model-generated impressions achieved overall mean scores of 3.56/4, 3.92/4, 3.37/4, 18.29 s,12.32 words, and 84 while the original impression written by a subspecialist radiologist achieved overall mean scores of 3.75/4, 3.87/4, 3.54/4, 12.2 s, 5.74 words, and 89 for clinical accuracy, grammatical accuracy, stylistic quality, edit time, edit distance, and ROUGE-L score respectively. The LLM achieved the highest clinical accuracy ratings for acute/emergent findings and on shorter impressions.ConclusionsAn open-source fine-tuned LLM can generate impressions to a satisfactory level of clinical accuracy, grammatical accuracy, and stylistic quality. Our reader performance study demonstrates the potential of large language models in drafting radiology report impressions that can aid in streamlining radiologists' workflows.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Large multimodality model fine-tuned for detecting breast and esophageal carcinomas on CT: a preliminary study
    Yasaka, Koichiro
    Kawamura, Motohide
    Sonoda, Yuki
    Kubo, Takatoshi
    Kiryu, Shigeru
    Abe, Osamu
    JAPANESE JOURNAL OF RADIOLOGY, 2024,
  • [22] Open-Source, Modular, High-Computational-Throughput Physiologically Based Pharmacokinetics (PBPK) Modeling Fine-Tuned for Optimization of Radiopharmaceutical Therapy
    Paranj, A. Fele
    Uribe, C.
    Saboury, B.
    Rahmim, A.
    EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2022, 49 (SUPPL 1) : S398 - S399
  • [23] Efficacy of Fine-Tuned Large Language Model in CT Protocol Assignment as Clinical Decision-Supporting System
    Kanemaru, Noriko
    Yasaka, Koichiro
    Okimoto, Naomasa
    Sato, Mai
    Nomura, Takuto
    Morita, Yuichi
    Katayama, Akira
    Kiryu, Shigeru
    Abe, Osamu
    JOURNAL OF IMAGING INFORMATICS IN MEDICINE, 2025,
  • [24] Sentiment Analysis of Song Dynasty Classical Poetry Using Fine-Tuned Large Language Models: A Study with LLMs
    Ihnaini, Baha
    Sun, Weiyi
    Cai, Yingchao
    Xu, Zhijun
    Sangi, Rashid
    2024 7TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA, ICAIBD 2024, 2024, : 590 - 597
  • [25] Performance of an Open-Source Large Language Model in Extracting Information from Free-Text Radiology Reports
    Le Guellec, Bastien
    Lefevre, Alexandre
    Geay, Charlotte
    Shorten, Lucas
    Bruge, Cyril
    Hacein-Bey, Lotfi
    Amouyel, Philippe
    Pruvo, Jean-Pierre
    Kuchcinski, Gregory
    Hamroun, Aghiles
    RADIOLOGY-ARTIFICIAL INTELLIGENCE, 2024, 6 (04)
  • [26] CD-LLMCARS: Cross Domain Fine-Tuned Large Language Model for Context-Aware Recommender Systems
    Cheema, Adeel Ashraf
    Sarfraz, Muhammad Shahzad
    Habib, Usman
    Zaman, Qamar Uz
    Boonchieng, Ekkarat
    IEEE OPEN JOURNAL OF THE COMPUTER SOCIETY, 2025, 6 : 49 - 59
  • [27] NM-GPT: Advancing Nuclear Medicine Report Processing Through a Specialized Fine-tuned Large Language Model
    Lyu, Zhiliang
    Zeng, Fang
    Guo, Ning
    Li, Xiang
    Li, Quanzheng
    JOURNAL OF NUCLEAR MEDICINE, 2024, 65
  • [28] OpenROAD-Assistant: An Open-Source Large Language Model for Physical Design Tasks
    Sharma, Utsav
    Wu, Bing-Yue
    Kankipati, Sai Rahul Dhanvi
    Chhabria, Vidya A.
    Rovinski, Austin
    PROCEEDINGS OF THE 2024 ACM/IEEE INTERNATIONAL SYMPOSIUM ON MACHINE LEARNING FOR CAD, MLCAD 2024, 2024,
  • [29] OpenROAD-Assistant: An Open-Source Large Language Model for Physical Design Tasks
    Sharma, Utsav
    Wu, Bing-Yue
    Kankipati, Sai Rahul Dhanvi
    Chhabria, Vidya A.
    Rovinski, Austin
    2024 ACM/IEEE 6TH SYMPOSIUM ON MACHINE LEARNING FOR CAD, MLCAD 2024, 2024,
  • [30] Prompt Engineering Approach Study for Supervised Fine-Tuned (SFT) Large Language Models (LLMs) in Spacecraft Fault Diagnosis
    Xia, Qing
    Zhao, Haotian
    Liu, Ming
    2024 3RD CONFERENCE ON FULLY ACTUATED SYSTEM THEORY AND APPLICATIONS, FASTA 2024, 2024, : 819 - 824