Assessing the performance of ChatGPT and Bard/Gemini against radiologists for Prostate Imaging-Reporting and Data System classification based on prostate multiparametric MRI text reports

被引:0
作者
Lee, Kang-Lung [1 ,2 ,3 ,4 ]
Kessler, Dimitri A. [1 ,2 ]
Caglic, Iztok [2 ]
Kuo, Yi-Hsin [3 ]
Shaida, Nadeem [2 ]
Barrett, Tristan [1 ,2 ]
机构
[1] Univ Cambridge, Dept Radiol, Cambridge CB2 0QQ, England
[2] Cambridge Univ Hosp NHS Fdn Trust, Addenbrookes Hosp, Dept Radiol, Cambridge CB2 0QQ, England
[3] Taipei Vet Gen Hosp, Dept Radiol, Taipei 112, Taiwan
[4] Natl Yang Ming Chiao Tung Univ, Sch Med, Taipei 112, Taiwan
基金
英国工程与自然科学研究理事会;
关键词
prostate MRI; PI-RADS; large language model; ChatGPT; Bard; Gemini;
D O I
10.1093/bjr/tqae236
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Objectives: Large language models (LLMs) have shown potential for clinical applications. This study assesses their ability to assign Prostate Imaging-Reporting and Data System (PI-RADS) categories based on clinical text reports. Methods: One hundred consecutive biopsy-na & iuml;ve patients' multiparametric prostate MRI reports were independently classified by 2 uroradiologists, ChatGPT-3.5 (GPT-3.5), ChatGPT-4o mini (GPT-4), Bard, and Gemini. Original report classifications were considered definitive. Results: Out of 100 MRIs, 52 were originally reported as PI-RADS 1-2, 9 PI-RADS 3, 19 PI-RADS 4, and 20 PI-RADS 5. Radiologists demonstrated 95% and 90% accuracy, while GPT-3.5 and Bard both achieved 67%. Accuracy of the updated versions of LLMs increased to 83% (GTP-4) and 79% (Gemini), respectively. In low suspicion studies (PI-RADS 1-2), Bard and Gemini (F1: 0.94, 0.98, respectively) outperformed GPT-3.5 and GTP-4 (F1:0.77, 0.94, respectively), whereas for high probability MRIs (PI-RADS 4-5), GPT-3.5 and GTP-4 (F1: 0.95, 0.98, respectively) outperformed Bard and Gemini (F1: 0.71, 0.87, respectively). Bard assigned a non-existent PI-RADS 6 "hallucination" for 2 patients. Inter- reader agreements (K) between the original reports and the senior radiologist, junior radiologist, GPT-3.5, GTP-4, BARD, and Gemini were 0.93, 0.84, 0.65, 0.86, 0.57, and 0.81, respectively. Conclusions: Radiologists demonstrated high accuracy in PI-RADS classification based on text reports, while GPT-3.5 and Bard exhibited poor performance. GTP-4 and Gemini demonstrated improved performance compared to their predecessors.
引用
收藏
页数:7
相关论文
共 33 条
  • [1] National implementation of multi-parametric magnetic resonance imaging for prostate cancer detection - recommendations from a UK consensus meeting
    Appayya, Mrishta Brizmohun
    Adshead, Jim
    Ahmed, Hashim U.
    Allen, Clare
    Bainbridge, Alan
    Barrett, Tristan
    Giganti, Francesco
    Graham, John
    Haslam, Phil
    Johnston, Edward W.
    Kastner, Christof
    Kirkham, Alexander P. S.
    Lipton, Alexandra
    McNeill, Alan
    Moniz, Larissa
    Moore, Caroline M.
    Nabi, Ghulam
    Padhani, Anwar R.
    Parker, Chris
    Patel, Amit
    Pursey, Jacqueline
    Richenberg, Jonathan
    Staffurth, John
    van der Meulen, Jan
    Walls, Darren
    Punwani, Shonit
    [J]. BJU INTERNATIONAL, 2018, 122 (01) : 13 - 25
  • [2] Three-year experience of a dedicated prostate mpMRI pre-biopsy programme and effect on timed cancer diagnostic pathways
    Barrett, T.
    Slough, R.
    Sushentsev, N.
    Shaida, N.
    Koo, B. C.
    Caglic, I.
    Kozlov, V.
    Warren, A. Y.
    Thankappannair, V.
    Pinnock, C.
    Shah, N.
    Saeb-Parsy, K.
    Gnanapragasam, V. J.
    Sala, E.
    Kastner, C.
    [J]. CLINICAL RADIOLOGY, 2019, 74 (11) : 894.e1 - 894.e9
  • [3] Update on Optimization of Prostate MR Imaging Technique and Image Quality
    Barrett, Tristan
    Lee, Kang-Lung
    de Rooij, Maarten
    Giganti, Francesco
    [J]. RADIOLOGIC CLINICS OF NORTH AMERICA, 2024, 62 (01) : 1 - 15
  • [4] Diagnostic performance and reproducibility of T2w based and diffusion weighted imaging (DWI) based PI-RADSv2 lexicon descriptors for prostate MRI
    Benndorf, Matthias
    Hahn, Felix
    Kroenig, Malte
    Jilg, Cordula Annette
    Krauss, Tobias
    Langer, Mathias
    Dovi-Akue, Philippe
    [J]. EUROPEAN JOURNAL OF RADIOLOGY, 2017, 93 : 9 - 15
  • [5] GPT-4 in Radiology: Improvements in Advanced Reasoning
    Bhayana, Rajesh
    Bleakney, Robert R.
    Krishna, Satheesh
    [J]. RADIOLOGY, 2023, 307 (05)
  • [6] Assessing the Performance of Chat Generative Pretrained Transformer (ChatGPT) in Answering Andrology-Related Questions
    Caglar, Ufuk
    Yildiz, Oguzhan
    Ozervarli, M. Firat
    Aydin, Resat
    Sarilar, Omer
    Ozgor, Faruk
    Ortac, Mazhar
    [J]. UROLOGY RESEARCH AND PRACTICE, 2023, 49 (06) : 365 - 369
  • [7] Innovative standardized reporting template for prostate mpMRI improves clarity and confidence in the report
    Caputo, Joseph M.
    Pina, Luis A.
    Sebesta, Elisabeth M.
    Shaish, Hiram
    Wenske, Sven
    [J]. WORLD JOURNAL OF UROLOGY, 2021, 39 (07) : 2447 - 2452
  • [8] Fundamentals of clinical research for radiologists - Reader agreement studies
    Crewson, PE
    [J]. AMERICAN JOURNAL OF ROENTGENOLOGY, 2005, 184 (05) : 1391 - 1397
  • [9] Large language models in radiology: fundamentals, applications, ethical considerations, risks, and future directions
    D'Antonoli, Tugba Akinci
    Stanzione, Arnaldo
    Bluethgen, Christian
    Vernuccio, Federica
    Ugga, Lorenzo
    Klontzas, Michail E.
    Cuocolo, Renato
    Cannella, Roberto
    Kocak, Burak
    [J]. DIAGNOSTIC AND INTERVENTIONAL RADIOLOGY, 2024, 30 (02): : 80 - 90
  • [10] ESUR/ESUI consensus statements on multi-parametric MRI for the detection of clinically significant prostate cancer: quality requirements for image acquisition, interpretation and radiologists' training
    de Rooij, Maarten
    Israel, Bas
    Tummers, Marcia
    Ahmed, Hashim U.
    Barrett, Tristan
    Giganti, Francesco
    Hamm, Bernd
    Logager, Vibeke
    Padhani, Anwar
    Panebianco, Valeria
    Puech, Philippe
    Richenberg, Jonathan
    Rouviere, Olivier
    Salomon, Georg
    Schoots, Ivo
    Veltman, Jeroen
    Villeirs, Geert
    Walz, Jochen
    Barentsz, Jelle O.
    [J]. EUROPEAN RADIOLOGY, 2020, 30 (10) : 5404 - 5416