An open-source fine-tuned large language model for radiological impression generation: a multi-reader performance study

被引：1

作者：

Serapio, Adrian ^{[1
]}

Chaudhari, Gunvant ^{[3
]}

Savage, Cody ^{[2
]}

Lee, Yoo Jin ^{[1
]}

Vella, Maya ^{[1
]}

Sridhar, Shravan ^{[1
]}

Schroeder, Jamie Lee ^{[4
]}

Liu, Jonathan ^{[1
]}

Yala, Adam ^{[5
,6
]}

Sohn, Jae Ho ^{[1
]}

机构：

[1] Univ Calif San Francisco, Dept Radiol & Biomed Imaging, San Francisco, CA 94143 USA

[2] Univ Maryland, Med Ctr, Dept Radiol, Baltimore, MD USA

[3] Univ Washington, Dept Radiol, Seattle, WA USA

[4] MedStar Georgetown Univ Hosp, Washington, DC USA

[5] Univ Calif Berkeley, Computat Precis Hlth, Berkeley, CA USA

[6] Univ Calif San Francisco, San Francisco, CA USA

来源：

BMC MEDICAL IMAGING | 2024年 / 24卷 / 01期

关键词：

Natural language processing; Large language model; Open-source; Summarization; Impressions;

D O I：

10.1186/s12880-024-01435-w

中图分类号：

R8 [特种医学]; R445 [影像诊断学];

学科分类号：

1002 ; 100207 ; 1009 ;

摘要：

BackgroundThe impression section integrates key findings of a radiology report but can be subjective and variable. We sought to fine-tune and evaluate an open-source Large Language Model (LLM) in automatically generating impressions from the remainder of a radiology report across different imaging modalities and hospitals.MethodsIn this institutional review board-approved retrospective study, we collated a dataset of CT, US, and MRI radiology reports from the University of California San Francisco Medical Center (UCSFMC) (n = 372,716) and the Zuckerberg San Francisco General (ZSFG) Hospital and Trauma Center (n = 60,049), both under a single institution. The Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score, an automatic natural language evaluation metric that measures word overlap, was used for automatic natural language evaluation. A reader study with five cardiothoracic radiologists was performed to more strictly evaluate the model's performance on a specific modality (CT chest exams) with a radiologist subspecialist baseline. We stratified the results of the reader performance study based on the diagnosis category and the original impression length to gauge case complexity.ResultsThe LLM achieved ROUGE-L scores of 46.51, 44.2, and 50.96 on UCSFMC and upon external validation, ROUGE-L scores of 40.74, 37.89, and 24.61 on ZSFG across the CT, US, and MRI modalities respectively, implying a substantial degree of overlap between the model-generated impressions and impressions written by the subspecialist attending radiologists, but with a degree of degradation upon external validation. In our reader study, the model-generated impressions achieved overall mean scores of 3.56/4, 3.92/4, 3.37/4, 18.29 s,12.32 words, and 84 while the original impression written by a subspecialist radiologist achieved overall mean scores of 3.75/4, 3.87/4, 3.54/4, 12.2 s, 5.74 words, and 89 for clinical accuracy, grammatical accuracy, stylistic quality, edit time, edit distance, and ROUGE-L score respectively. The LLM achieved the highest clinical accuracy ratings for acute/emergent findings and on shorter impressions.ConclusionsAn open-source fine-tuned LLM can generate impressions to a satisfactory level of clinical accuracy, grammatical accuracy, and stylistic quality. Our reader performance study demonstrates the potential of large language models in drafting radiology report impressions that can aid in streamlining radiologists' workflows.

引用

页数：14

共 50 条

[21] Large multimodality model fine-tuned for detecting breast and esophageal carcinomas on CT: a preliminary study
Yasaka, Koichiro
Kawamura, Motohide
Sonoda, Yuki
Kubo, Takatoshi
Kiryu, Shigeru
Abe, Osamu
JAPANESE JOURNAL OF RADIOLOGY, 2024,
[22] Open-Source, Modular, High-Computational-Throughput Physiologically Based Pharmacokinetics (PBPK) Modeling Fine-Tuned for Optimization of Radiopharmaceutical Therapy
Paranj, A. Fele
Uribe, C.
Saboury, B.
Rahmim, A.
EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2022, 49 (SUPPL 1) : S398 - S399
[23] Efficacy of Fine-Tuned Large Language Model in CT Protocol Assignment as Clinical Decision-Supporting System
Kanemaru, Noriko
Yasaka, Koichiro
Okimoto, Naomasa
Sato, Mai
Nomura, Takuto
Morita, Yuichi
Katayama, Akira
Kiryu, Shigeru
Abe, Osamu
JOURNAL OF IMAGING INFORMATICS IN MEDICINE, 2025,
[24] Sentiment Analysis of Song Dynasty Classical Poetry Using Fine-Tuned Large Language Models: A Study with LLMs
Ihnaini, Baha
Sun, Weiyi
Cai, Yingchao
Xu, Zhijun
Sangi, Rashid
2024 7TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA, ICAIBD 2024, 2024, : 590 - 597
[25] Performance of an Open-Source Large Language Model in Extracting Information from Free-Text Radiology Reports
Le Guellec, Bastien
Lefevre, Alexandre
Geay, Charlotte
Shorten, Lucas
Bruge, Cyril
Hacein-Bey, Lotfi
Amouyel, Philippe
Pruvo, Jean-Pierre
Kuchcinski, Gregory
Hamroun, Aghiles
RADIOLOGY-ARTIFICIAL INTELLIGENCE, 2024, 6 (04)
[26] CD-LLMCARS: Cross Domain Fine-Tuned Large Language Model for Context-Aware Recommender Systems
Cheema, Adeel Ashraf
Sarfraz, Muhammad Shahzad
Habib, Usman
Zaman, Qamar Uz
Boonchieng, Ekkarat
IEEE OPEN JOURNAL OF THE COMPUTER SOCIETY, 2025, 6 : 49 - 59
[27] NM-GPT: Advancing Nuclear Medicine Report Processing Through a Specialized Fine-tuned Large Language Model
Lyu, Zhiliang
Zeng, Fang
Guo, Ning
Li, Xiang
Li, Quanzheng
JOURNAL OF NUCLEAR MEDICINE, 2024, 65
[28] OpenROAD-Assistant: An Open-Source Large Language Model for Physical Design Tasks
Sharma, Utsav
Wu, Bing-Yue
Kankipati, Sai Rahul Dhanvi
Chhabria, Vidya A.
Rovinski, Austin
PROCEEDINGS OF THE 2024 ACM/IEEE INTERNATIONAL SYMPOSIUM ON MACHINE LEARNING FOR CAD, MLCAD 2024, 2024,
[29] OpenROAD-Assistant: An Open-Source Large Language Model for Physical Design Tasks
Sharma, Utsav
Wu, Bing-Yue
Kankipati, Sai Rahul Dhanvi
Chhabria, Vidya A.
Rovinski, Austin
2024 ACM/IEEE 6TH SYMPOSIUM ON MACHINE LEARNING FOR CAD, MLCAD 2024, 2024,
[30] Prompt Engineering Approach Study for Supervised Fine-Tuned (SFT) Large Language Models (LLMs) in Spacecraft Fault Diagnosis
Xia, Qing
Zhao, Haotian
Liu, Ming
2024 3RD CONFERENCE ON FULLY ACTUATED SYSTEM THEORY AND APPLICATIONS, FASTA 2024, 2024, : 819 - 824

← 1 2 3 4 5 →