Generative pretrained transformer-4, an artificial intelligence text predictive model, has a high capability for passing novel written radiology exam questions

被引:4
作者
Sood, Avnish [1 ]
Mansoor, Nina [2 ]
Memmi, Caroline [3 ]
Lynch, Magnus [4 ,5 ]
Lynch, Jeremy [2 ]
机构
[1] Kings Coll London, London WC2R 2LS, England
[2] Kings Coll Hosp London, Dept Neuroradiol, Denmark Hill, London SE5 9RS, England
[3] Imperial Coll London, Exhibit Rd, London SW7 2AZ, England
[4] Kings Coll London, Guys Hosp, Ctr Stem Cells & Regenerat Med, London, England
[5] Kings Coll London, St Johns Inst Dermatol, London, England
关键词
Artificial intelligence; Large language model; Image interpretation; Radiology examination;
D O I
10.1007/s11548-024-03071-9
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
PurposeAI-image interpretation, through convolutional neural networks, shows increasing capability within radiology. These models have achieved impressive performance in specific tasks within controlled settings, but possess inherent limitations, such as the inability to consider clinical context. We assess the ability of large language models (LLMs) within the context of radiology specialty exams to determine whether they can evaluate relevant clinical information.MethodsA database of questions was created with official sample, author written, and textbook questions based on the Royal College of Radiology (United Kingdom) FRCR 2A and American Board of Radiology (ABR) Certifying examinations. The questions were input into the Generative Pretrained Transformer (GPT) versions 3 and 4, with prompting to answer the questions.ResultsOne thousand seventy-two questions were evaluated by GPT-3 and GPT-4. 495 (46.2%) were for the FRCR 2A and 577 (53.8%) were for the ABR exam. There were 890 single best answers (SBA), and 182 true/false questions. GPT-4 was correct in 629/890 (70.7%) SBA and 151/182 (83.0%) true/false questions. There was no degradation on author written questions. GPT-4 performed significantly better than GPT-3 which selected the correct answer in 282/890 (31.7%) SBA and 111/182 (61.0%) true/false questions. Performance of GPT-4 was similar across both examinations for all categories of question.ConclusionThe newest generation of LLMs, GPT-4, demonstrates high capability in answering radiology exam questions. It shows marked improvement from GPT-3, suggesting further improvements in accuracy are possible. Further research is needed to explore the clinical applicability of these AI models in real-world settings.
引用
收藏
页码:645 / 653
页数:9
相关论文
共 14 条
  • [1] Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis
    Aggarwal, Ravi
    Sounderajah, Viknesh
    Martin, Guy
    Ting, Daniel S. W.
    Karthikesalingam, Alan
    King, Dominic
    Ashrafian, Hutan
    Darzi, Ara
    [J]. NPJ DIGITAL MEDICINE, 2021, 4 (01)
  • [2] Gilson Aidan, 2023, JMIR Med Educ, V9, pe45312, DOI 10.2196/45312
  • [3] Artificial intelligence in diagnostic imaging: impact on the radiography profession
    Hardy, Maryann
    Harvey, Hugh
    [J]. BRITISH JOURNAL OF RADIOLOGY, 2020, 93 (1108)
  • [4] The use of ChatGPT and other large language models in surgical science
    Janssen, Boris, V
    Kazemier, Geert
    Besselink, Marc G.
    [J]. BJS OPEN, 2023, 7 (02):
  • [5] Survey of Hallucination in Natural Language Generation
    Ji, Ziwei
    Lee, Nayeon
    Frieske, Rita
    Yu, Tiezheng
    Su, Dan
    Xu, Yan
    Ishii, Etsuko
    Bang, Ye Jin
    Madotto, Andrea
    Fung, Pascale
    [J]. ACM COMPUTING SURVEYS, 2023, 55 (12)
  • [6] Kelly BS, 2022, EUR RADIOL, V32, P7998, DOI 10.1007/s00330-022-08784-6
  • [7] Lindsay R., 2012, SBAS FINAL FRCR 2A, DOI [10.1093/oso/9780199607761.001.0001, DOI 10.1093/OSO/9780199607761.001.0001]
  • [8] OpenAI, 2023, Gpt-4 technical report, DOI 10.48550/arXiv.2303.08774
  • [9] R Core Team, 2020, R: A language and environment for statistical computing
  • [10] Can artificial intelligence pass the Fellowship of the Royal College of Radiologists examination? Multi-reader diagnostic accuracy study
    Shelmerdine, Susan Cheng
    Martin, Helena
    Shirodhar, Kapil
    Shamshuddin, Sameer
    Weir-McCall, Jonathan Richard
    [J]. BMJ-BRITISH MEDICAL JOURNAL, 2022, 379