Comparative Evaluation of Large Language Models for Translating Radiology Reports into Hindi

被引:0
作者
Gupta, Amit [1 ]
Rastogi, Ashish [1 ]
Malhotra, Hema [1 ]
Rangarajan, Krithika [1 ]
机构
[1] All India Inst Med Sci, Dr BRA IRCH, Dept Radiol, New Delhi, India
关键词
large language models; ChatGPT; radiology reports; BLEU score;
D O I
10.1055/s-0044-1789618
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Objective The aim of this study was to compare the performance of four publicly available large language models (LLMs)-GPT-4o, GPT-4, Gemini, and Claude Opus-in translating radiology reports into simple Hindi. Materials and Methods In this retrospective study, 100 computed tomography (CT) scan report impressions were gathered from a tertiary care cancer center. Reference translations of these impressions into simple Hindi were done by a bilingual radiology staff in consultation with a radiologist. Two distinct prompts were used to assess the LLMs' ability to translate these report impressions into simple Hindi. Translated reports were assessed by a radiologist for instances of misinterpretation, omission, and addition of fictitious information. Translation quality was assessed using Bilingual Evaluation Understudy (BLEU), Metric for Evaluation of Translation with Explicit ORdering (METEOR), Translation Edit Rate (TER), and character F-score (CHRF) scores. Statistical analyses were performed to compare the LLM performance across prompts. Results Nine instances of misinterpretation and two instances of omission of information were found on radiologist evaluation of the total 800 LLM-generated translated report impressions. For prompt 1, Gemini outperformed others in BLEU (p < 0.001) and METEOR scores (p = 0.001), and was superior to GPT-4o and GPT-4 in TER and CHRF (p < 0.001), but comparable to Claude (p = 0.501 for TER and p = 0.90 for CHRF). For prompt 2, GPT-4o outperformed all others (p < 0.001) in all metrics. Prompt 2 yielded better BLEU, METEOR, and CHRF scores (p < 0.001), while prompt 1 had a better TER score (p < 0.001). Conclusion While each LLM's effectiveness varied with prompt wording, all models demonstrated potential in translating and simplifying radiology report impressions.
引用
收藏
页码:88 / 96
页数:9
相关论文
共 16 条
  • [1] Amin KS, 2023, RADIOLOGY, V309, DOI 10.1148/radiol.232561
  • [2] Chatbots and Large Language Models in Radiology: A Practical Primer for Clinical and Research Applications
    Bhayana, Rajesh
    [J]. RADIOLOGY, 2024, 310 (01)
  • [3] Informed or anxious: patient preferences for release of test results of increasing sensitivity on electronic patient portals
    Bruno, Bethany
    Steele, Scott
    Carbone, Justin
    Schneider, Katherine
    Posk, Lori
    Rose, Susannah L.
    [J]. HEALTH AND TECHNOLOGY, 2022, 12 (01) : 59 - 67
  • [4] Doshi R., 2023, MEDRXIV, DOI [10.1101/2023.06.04.23290786, DOI 10.1101/2023.06.04.23290786]
  • [5] Fan L, 2023, A bibliometric review of large language models research from 2017 to 2023. 2304.02020
  • [6] Patient-centered Radiology
    Itri, Jason N.
    [J]. RADIOGRAPHICS, 2015, 35 (06) : 1835 - U227
  • [7] ChatGPT makes medicine easy to swallow: an exploratory case study on simplified radiology reports
    Jeblick, Katharina
    Schachtner, Balthasar
    Dexl, Jakob
    Mittermeier, Andreas
    Stueber, Anna Theresa
    Topalis, Johanna
    Weber, Tobias
    Wesp, Philipp
    Sabel, Bastian Oliver
    Ricke, Jens
    Ingrisch, Michael
    [J]. EUROPEAN RADIOLOGY, 2024, 34 (05) : 2817 - 2825
  • [8] The METEOR metric for automatic evaluation of machine translation
    Lavie, Alon
    Denkowski, Michael J.
    [J]. MACHINE TRANSLATION, 2009, 23 (2-3) : 105 - 115
  • [9] Decoding radiology reports: Potential application of OpenAI ChatGPT to enhance patient understanding of diagnostic reports
    Li , Hanzhou
    Moon, John T.
    Iyer, Deepak
    Balthazar, Patricia
    Krupinski, Elizabeth A.
    Bercu, Zachary L.
    Newsome, Janice M.
    Banerjee, Imon
    Gichoya, Judy W.
    Trivedi, Hari M.
    [J]. CLINICAL IMAGING, 2023, 101 : 137 - 141
  • [10] Translating radiology reports into plain language using ChatGPT and GPT-4 with prompt learning: results, limitations, and potential
    Lyu, Qing
    Tan, Josh
    Zapadka, Michael E.
    Ponnatapura, Janardhana
    Niu, Chuang
    Myers, Kyle J.
    Wang, Ge
    Whitlow, Christopher T.
    [J]. VISUAL COMPUTING FOR INDUSTRY BIOMEDICINE AND ART, 2023, 6 (01)