Evaluation of large language models performance against humans for summarizing MRI knee radiology reports: A feasibility study

被引:4
|
作者
Lopez-Ubeda, Pilar [1 ]
Martin-Noguerol, Teodoro [2 ]
Diaz-Angulo, Carolina [3 ]
Luna, Antonio [2 ]
机构
[1] Nat Language Proc Unit, Hlth Time, Jaen, Spain
[2] Hlth Time, MRI Unit, Radiol Dept, Jaen, Spain
[3] Hlth Time, MRI Unit, Radiol Dept, Jaen, Spain
关键词
Radiology report summarization; Natural Language Processing; Large Language Model; Knee MRI reports; Human expert evaluation;
D O I
10.1016/j.ijmedinf.2024.105443
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objectives: This study addresses the critical need for accurate summarization in radiology by comparing various Large Language Model (LLM)-based approaches for automatic summary generation. With the increasing volume of patient information, accurately and concisely conveying radiological findings becomes crucial for effective clinical decision -making. Minor inaccuracies in summaries can lead to significant consequences, highlighting the need for reliable automated summarization tools. Methods: We employed two language models - Text -to -Text Transfer Transformer (T5) and Bidirectional and Auto -Regressive Transformers (BART) - in both fine-tuned and zero -shot learning scenarios and compared them with a Recurrent Neural Network (RNN). Additionally, we conducted a comparative analysis of 100 MRI report summaries, using expert human judgment and criteria such as coherence, relevance, fluency, and consistency, to evaluate the models against the original radiologist summaries. To facilitate this, we compiled a dataset of 15,508 retrospective knee Magnetic Resonance Imaging (MRI) reports from our Radiology Information System (RIS), focusing on the findings section to predict the radiologist 's summary. Results: The fine-tuned models outperform the neural network and show superior performance in the zero -shot variant. Specifically, the T5 model achieved a Rouge -L score of 0.638. Based on the radiologist readers ' study, the summaries produced by this model were found to be very similar to those produced by a radiologist, with about 70% similarity in fluency and consistency between the T5 -generated summaries and the original ones. Conclusions: Technological advances, especially in NLP and LLM, hold great promise for improving and streamlining the summarization of radiological findings, thus providing valuable assistance to radiologists in their work.
引用
收藏
页数:10
相关论文
共 37 条
  • [1] Generating colloquial radiology reports with large language models
    Tang, Cynthia Crystal
    Nagesh, Supriya
    Fussell, David A.
    Glavis-Bloom, Justin
    Mishra, Nina
    Li, Charles
    Cortes, Gillean
    Hill, Robert
    Zhao, Jasmine
    Gordon, Angellica
    Wright, Joshua
    Troutt, Hayden
    Tarrago, Rod
    Chow, Daniel S.
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (11) : 2660 - 2667
  • [2] Large Language Models for Simplified Interventional Radiology Reports: A Comparative Analysis
    Can, Elif
    Uller, Wibke
    Vogt, Katharina
    Doppler, Michael C.
    Busch, Felix
    Bayerl, Nadine
    Ellmann, Stephan
    Kader, Avan
    Elkilany, Aboelyazid
    Makowski, Marcus R.
    Bressem, Keno K.
    Adams, Lisa C.
    ACADEMIC RADIOLOGY, 2025, 32 (02) : 888 - 898
  • [3] Use of large language models in radiological reports: A study on simplifying turkish MRI findings
    Cesur, Turay
    Camur, Eren
    Gunes, Yasin Celal
    ANNALS OF CLINICAL AND ANALYTICAL MEDICINE, 2024, 15 (08): : 586 - 590
  • [4] Evaluation of radiology residents' reporting skills using large language models: an observational study
    Atsukawa, Natsuko
    Tatekawa, Hiroyuki
    Oura, Tatsushi
    Matsushita, Shu
    Horiuchi, Daisuke
    Takita, Hirotaka
    Mitsuyama, Yasuhito
    Omori, Ayako
    Shimono, Taro
    Miki, Yukio
    Ueda, Daiju
    JAPANESE JOURNAL OF RADIOLOGY, 2025,
  • [5] Adapting Large Language Models for Automatic Annotation of Radiology Reports for Metastases Detection
    Barabadi, Maede Ashofteh
    Chan, Wai Yip
    Zhu, Xiaodan
    Simpson, Amber L.
    Do, Richard K. G.
    2024 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CCECE 2024, 2024, : 340 - 345
  • [6] Automated anonymization of radiology reports: comparison of publicly available natural language processing and large language models
    Langenbach, Marcel C.
    Foldyna, Borek
    Hadzic, Ibrahim
    Langenbach, Isabel L.
    Raghu, Vineet K.
    Lu, Michael T.
    Neilan, Tomas G.
    Heemelaar, Julius C.
    EUROPEAN RADIOLOGY, 2024, : 2634 - 2641
  • [7] Large language models for efficient whole-organ MRI score-based reports and categorization in knee osteoarthritis
    Yuxue Xie
    Zhonghua Hu
    Hongyue Tao
    Yiwen Hu
    Haoyu Liang
    Xinmin Lu
    Lei Wang
    Xiangwen Li
    Shuang Chen
    Insights into Imaging, 16 (1)
  • [8] Automated classification of brain MRI reports using fine-tuned large language models
    Kanzawa, Jun
    Yasaka, Koichiro
    Fujita, Nana
    Fujiwara, Shin
    Abe, Osamu
    NEURORADIOLOGY, 2024, 66 (12) : 2177 - 2183
  • [9] A COMPARATIVE STUDY: PERFORMANCE OF LARGE LANGUAGE MODELS IN SIMPLIFYING TURKISH COMPUTED TOMOGRAPHY REPORTS
    Camur, Eren
    Cesur, Turay
    Gunes, Yasin Celal
    JOURNAL OF ISTANBUL FACULTY OF MEDICINE-ISTANBUL TIP FAKULTESI DERGISI, 2024, 87 (04): : 321 - 326
  • [10] Inferring cancer disease response from radiology reports using large language models with data augmentation and prompting
    Tan, Ryan Shea Ying Cong
    Lin, Qian
    Low, Guat Hwa
    Lin, Ruixi
    Goh, Tzer Chew
    Chang, Christopher Chu En
    Lee, Fung Fung
    Chan, Wei Yin
    Tan, Wei Chong
    Tey, Han Jieh
    Leong, Fun Loon
    Tan, Hong Qi
    Nei, Wen Long
    Chay, Wen Yee
    Tai, David Wai Meng
    Lai, Gillianne Geet Yi
    Cheng, Lionel Tim-Ee
    Wong, Fuh Yong
    Chua, Matthew Chin Heng
    Chua, Melvin Lee Kiang
    Tan, Daniel Shao Weng
    Thng, Choon Hua
    Tan, Iain Bee Huat
    Ng, Hwee Tou
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2023, 30 (10) : 1657 - 1664