Preliminary assessment of automated radiology report generation with generative pre-trained transformers: comparing results to radiologist-generated reports

被引:30
作者
Nakaura, Takeshi [1 ]
Yoshida, Naofumi [1 ]
Kobayashi, Naoki [1 ]
Shiraishi, Kaori [1 ]
Nagayama, Yasunori [1 ]
Uetani, Hiroyuki [1 ]
Kidoh, Masafumi [1 ]
Hokamura, Masamichi [1 ]
Funama, Yoshinori [2 ]
Hirai, Toshinori [1 ]
机构
[1] Kumamoto Univ, Grad Sch Med Sci, Dept Diagnost Radiol, 1-1-1 Honjo,Chuo Ku, Kumamoto, Kumamoto 8608556, Japan
[2] Kumamoto Univ, Fac Life Sci, Dept Med Phys, Honjo 1-1-1, Kumamoto 8608556, Japan
关键词
Radiology report; Computed tomography; Deep learning; Large language model; Generative pre-trained transformer;
D O I
10.1007/s11604-023-01487-y
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Purpose In this preliminary study, we aimed to evaluate the potential of the generative pre-trained transformer (GPT) series for generating radiology reports from concise imaging findings and compare its performance with radiologist-generated reports.Methods This retrospective study involved 28 patients who underwent computed tomography (CT) scans and had a diagnosed disease with typical imaging findings. Radiology reports were generated using GPT-2, GPT-3.5, and GPT-4 based on the patient's age, gender, disease site, and imaging findings. We calculated the top-1, top-5 accuracy, and mean average precision (MAP) of differential diagnoses for GPT-2, GPT-3.5, GPT-4, and radiologists. Two board-certified radiologists evaluated the grammar and readability, image findings, impression, differential diagnosis, and overall quality of all reports using a 4-point scale.Results Top-1 and Top-5 accuracies for the different diagnoses were highest for radiologists, followed by GPT-4, GPT-3.5, and GPT-2, in that order (Top-1: 1.00, 0.54, 0.54, and 0.21, respectively; Top-5: 1.00, 0.96, 0.89, and 0.54, respectively). There were no significant differences in qualitative scores about grammar and readability, image findings, and overall quality between radiologists and GPT-3.5 or GPT-4 (p > 0.05). However, qualitative scores of the GPT series in impression and differential diagnosis scores were significantly lower than those of radiologists (p < 0.05).Conclusions Our preliminary study suggests that GPT-3.5 and GPT-4 have the possibility to generate radiology reports with high readability and reasonable image findings from very short keywords; however, concerns persist regarding the accuracy of impressions and differential diagnoses, thereby requiring verification by radiologists.
引用
收藏
页码:190 / 200
页数:11
相关论文
共 20 条
  • [1] Leveraging GPT-4 for Post Hoc Transformation of Free-text Radiology Reports into Structured Reporting: A Multilingual Feasibility Study
    Adams, Lisa C.
    Truhn, Daniel
    Busch, Felix
    Kader, Avan
    Niehues, Stefan M.
    Makowski, Marcus R.
    Bressem, Keno K.
    [J]. RADIOLOGY, 2023, 307 (04)
  • [2] Barat M, 2021, JPN J RADIOL, V39, P514, DOI 10.1007/s11604-021-01098-5
  • [3] Brown Tom B., 2020, ADV NEURAL INFORM PR, V2005, P14165, DOI DOI 10.48550/ARXIV.2005.14165
  • [4] Artificial intelligence in lung cancer: current applications and perspectives
    Chassagnon, Guillaume
    De Margerie-Mellon, Constance
    Vakalopoulou, Maria
    Marini, Rafael
    Trieu-Nghi Hoang-Thi
    Revel, Marie-Pierre
    Soyer, Philippe
    [J]. JAPANESE JOURNAL OF RADIOLOGY, 2023, 41 (03) : 235 - 244
  • [5] How to Create a Great Radiology Report
    Hartung, Michael P.
    Bickle, Ian C.
    Gaillard, Frank
    Kanne, Jeffrey P.
    [J]. RADIOGRAPHICS, 2020, 40 (06) : 1658 - 1670
  • [6] Survey of Hallucination in Natural Language Generation
    Ji, Ziwei
    Lee, Nayeon
    Frieske, Rita
    Yu, Tiezheng
    Su, Dan
    Xu, Yan
    Ishii, Etsuko
    Bang, Ye Jin
    Madotto, Andrea
    Fung, Pascale
    [J]. ACM COMPUTING SURVEYS, 2023, 55 (12)
  • [7] A novel strategy to develop deep learning for image super-resolution using original ultra-high-resolution computed tomography images of lung as training dataset
    Kitahara, Hitoshi
    Nagatani, Yukihiro
    Otani, Hideji
    Nakayama, Ryohei
    Kida, Yukako
    Sonoda, Akinaga
    Watanabe, Yoshiyuki
    [J]. JAPANESE JOURNAL OF RADIOLOGY, 2022, 40 (01) : 38 - 47
  • [8] Kung TH, 2023, PLOS DIGIT HEALTH, V2, DOI 10.1371/journal.pdig.0000198
  • [9] Liu TY, 2022, PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), P6723
  • [10] Ouyang L, 2022, ADV NEUR IN