Large Language Model Ability to Translate CT and MRI Free-Text Radiology Reports Into Multiple Languages

被引:0
作者
Meddeb, Aymen [1 ,5 ,6 ]
Lueken, Sophia [2 ,3 ,4 ]
Busch, Felix [7 ]
Adams, Lisa [7 ]
Ugga, Lorenzo [8 ]
Koltsakis, Emmanouil [9 ]
Tzortzakakis, Antonios [10 ]
Jelassi, Soumaya [11 ]
Dkhil, Insaf [11 ]
Klontzas, Michail E. [12 ,13 ]
Triantafyllou, Matthaios [12 ]
Kocak, Burak [14 ]
Yuezkan, Sabahattin [15 ]
Zhang, Longjiang [16 ]
Hu, Bin [16 ]
Andreychenko, Anna [17 ]
Yurievich, Efimtcev Alexander [17 ]
Logunova, Tatiana [17 ]
Morakote, Wipawee [18 ]
Angkurawaranon, Salita [18 ]
Makowski, Marcus R. [7 ]
Wattjes, Mike P. [1 ]
Cuocolo, Renato [19 ]
Bressem, Keno [7 ,20 ]
机构
[1] Charite Univ Med Berlin, Dept Neuroradiol, Berlin, Germany
[2] Charite Univ Med Berlin, Dept Radiol, Berlin, Germany
[3] Free Univ Berlin, Berlin, Germany
[4] Humboldt Univ, Berlin, Germany
[5] Univ Reims, Hop Maison Blanche, Dept Neuroradiol, CHU Reims, 45 Rue Cognacq Jay, F-51092 Reims, France
[6] Charite Univ Med Berlin, Berlin Inst Hlth, Berlin, Germany
[7] Tech Univ Munich, TUM Univ Hosp, Sch Med & Hlth, Dept Diagnost & Intervent Radiol,Klinikum Rechts I, Munich, Germany
[8] Univ Naples Federico II, Dept Adv Biomed Sci, Naples, Italy
[9] Karolinska Univ Hosp, Dept Radiol, Stockholm, Sweden
[10] Karolinska Inst, Dept Clin Sci Intervent & Technol CLINTEC, Div Radiol, Stockholm, Sweden
[11] Natl Inst Mongi Ben Hamida Neurol, Dept Radiol, Tunis, Tunisia
[12] Univ Crete, Sch Med, Dept Radiol, Iraklion, Greece
[13] Fdn Res & Technol FORTH, Inst Comp Sci, Computat Biomed Lab, Iraklion, Greece
[14] Univ Hlth Sci, Basaksehir Cam & Sakura City Hosp, Dept Radiol, Basaksehir, Istanbul, Turkiye
[15] Koc Univ Hosp, Dept Radiol, Istanbul, Turkiye
[16] Nanjing Univ, Jinling Hosp, Affiliated Hosp, Med Sch,Dept Radiol, Nanjing, Peoples R China
[17] ITMO Univ, Lab Digital Publ Hlth Technol, St Petersburg, Russia
[18] Chiang Mai Univ, Dept Radiol, Chiang Mai, Thailand
[19] Univ Salerno, Dept Med Surg & Dent, Baronissi, Italy
[20] Tech Univ Munich, TUM Univ Hosp, Inst Cardiovasc Radiol & Nucl Med, German Heart Ctr Munich, Munich, Germany
关键词
D O I
10.1148/radiol.241736
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Background: High-quality translations of radiology reports are essential for optimal patient care. Because of limited availability of human translators with medical expertise, large language models (LLMs) are a promising solution, but their ability to translate radiology reports remains largely unexplored. Purpose: To evaluate the accuracy and quality of various LLMs in translating radiology reports across high-resource languages (English, Italian, French, German, and Chinese) and low-resource languages (Swedish, Turkish, Russian, Greek, and Thai). Materials and Methods: A dataset of 100 synthetic free-text radiology reports from CT and MRI scans was translated by 18 radiologists between January 14 and May 2, 2024, into nine target languages. Ten LLMs, including GPT-4 (OpenAI), Llama 3 (Meta), and Mixtral models (Mistral AI), were used for automated translation. Translation accuracy and quality were assessed with use of BiLingual Evaluation Understudy (BLEU) score, translation error rate (TER), and CHaRacter-level F-score (chrF++) metrics. Statistical significance was evaluated with use of paired t tests with Holm-Bonferroni corrections. Radiologists also conducted a qualitative evaluation of translations with use of a standardized questionnaire. Results: GPT-4 demonstrated the best overall translation quality, particularly from English to German (BLEU score: 35.0 +/- 16.3 [SD]; TER: 61.7 +/- 21.2; chrF++: 70.6 +/- 9.4), to Greek (BLEU: 32.6 +/- 10.1; TER: 52.4 +/- 10.6; chrF++: 62.8 +/- 6.4), to Thai (BLEU: 53.2 +/- 7.3; TER: 74.3 +/- 5.2; chrF++: 48.4 +/- 6.6), and to Turkish (BLEU: 35.5 +/- 6.6; TER: 52.7 +/- 7.4; chrF++: 70.7 +/- 3.7). GPT-3.5 showed highest accuracy in translations from English to French, and Qwen1.5 excelled in English-to-Chinese translations, whereas Mixtral 8x22B performed best in Italian-to-English translations. The qualitative evaluation revealed that LLMs excelled in clarity, readability, and consistency with the original meaning but showed moderate medical terminology accuracy. Conclusion: LLMs showed high accuracy and quality for translating radiology reports, although results varied by model and language pair.
引用
收藏
页数:11
相关论文
共 31 条
  • [1] Achiam J, GPT-4 technical report
  • [2] Leveraging GPT-4 for Post Hoc Transformation of Free-text Radiology Reports into Structured Reporting: A Multilingual Feasibility Study
    Adams, Lisa C.
    Truhn, Daniel
    Busch, Felix
    Kader, Avan
    Niehues, Stefan M.
    Makowski, Marcus R.
    Bressem, Keno K.
    [J]. RADIOLOGY, 2023, 307 (04)
  • [3] Anastasopoulos Antonios, 2018, P 2018 C N AM CHAPT, V1, P82, DOI [DOI 10.18653/V1/N18-1008, 10.18653/v1/N18-1008]
  • [4] Bai J, Qwen Technical Report
  • [5] Bang Y, 2023, Arxiv, DOI [arXiv:2302.04023, 10.48550/arXiv.2302.04023]
  • [6] Chen B, 2025, Arxiv, DOI [arXiv:2403.04652, 10.48550/arXiv.2403.04652]
  • [7] Doshi R, 2023, medRxiv, DOI [10.1101/2023.06.04.23290786, 10.1101/2023.06.04.23290786, DOI 10.1101/2023.06.04.23290786]
  • [8] García-Ferrero I, 2024, Arxiv, DOI [arXiv:2404.07613, DOI 10.48550/ARXIV.2404.07613]
  • [9] Grattafiori A., 2024, PREPRINT, DOI DOI 10.48550/ARXIV.2407.21783
  • [10] Intrator Y, 2024, Arxiv, DOI [arXiv:2403.04792, DOI 10.48550/ARXIV.2403.04792]