Assessing GPT-4 multimodal performance in radiological image analysis

被引:17
作者
Brin, Dana [1 ,2 ]
Sorin, Vera [1 ,2 ,3 ]
Barash, Yiftach [1 ,2 ,3 ]
Konen, Eli [1 ,2 ]
Glicksberg, Benjamin S. [4 ]
Nadkarni, Girish N. [5 ,6 ]
Klang, Eyal [1 ,2 ,3 ,5 ,6 ]
机构
[1] Chaim Sheba Med Ctr, Dept Diagnost Imaging, Tel Hashomer, Israel
[2] Tel Aviv Univ, Fac Med, Tel Aviv, Israel
[3] Chaim Sheba Med Ctr, DeepVis Lab, Tel Hashomer, Israel
[4] Icahn Sch Med Mt Sinai, Hasso Plattner Inst Digital Hlth, New York, NY USA
[5] Icahn Sch Med Mt Sinai, Div Data Driven & Digital Med D3M, New York, NY USA
[6] Icahn Sch Med Mt Sinai, Charles Bronfman Inst Personalized Med, New York, NY USA
关键词
Artificial intelligence; Diagnostic imaging; Radiology; Ultrasonography; Computed tomography (x-ray);
D O I
10.1007/s00330-024-11035-5
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Objectives This study aims to assess the performance of a multimodal artificial intelligence (AI) model capable of analyzing both images and textual data (GPT-4V), in interpreting radiological images. It focuses on a range of modalities, anatomical regions, and pathologies to explore the potential of zero-shot generative AI in enhancing diagnostic processes in radiology. Methods We analyzed 230 anonymized emergency room diagnostic images, consecutively collected over 1 week, using GPT-4V. Modalities included ultrasound (US), computerized tomography (CT), and X-ray images. The interpretations provided by GPT-4V were then compared with those of senior radiologists. This comparison aimed to evaluate the accuracy of GPT-4V in recognizing the imaging modality, anatomical region, and pathology present in the images. Results GPT-4V identified the imaging modality correctly in 100% of cases (221/221), the anatomical region in 87.1% (189/217), and the pathology in 35.2% (76/216). However, the model's performance varied significantly across different modalities, with anatomical region identification accuracy ranging from 60.9% (39/64) in US images to 97% (98/101) and 100% (52/52) in CT and X-ray images (p < 0.001). Similarly, pathology identification ranged from 9.1% (6/66) in US images to 36.4% (36/99) in CT and 66.7% (34/51) in X-ray images (p < 0.001). These variations indicate inconsistencies in GPT-4V's ability to interpret radiological images accurately. Conclusion While the integration of AI in radiology, exemplified by multimodal GPT-4, offers promising avenues for diagnostic enhancement, the current capabilities of GPT-4V are not yet reliable for interpreting radiological images. This study underscores the necessity for ongoing development to achieve dependable performance in radiology diagnostics. Clinical relevance statement Although GPT-4V shows promise in radiological image interpretation, its high diagnostic hallucination rate (> 40%) indicates it cannot be trusted for clinical use as a standalone tool. Improvements are necessary to enhance its reliability and ensure patient safety. Key Points...
引用
收藏
页码:1959 / 1965
页数:7
相关论文
共 20 条
[1]   Potential Applications and Impact of ChatGPT in Radiology [J].
Bajaj, Suryansh ;
Gandhi, Darshan ;
Nayar, Divya .
ACADEMIC RADIOLOGY, 2024, 31 (04) :1256-1261
[2]  
Crimì F, 2023, RADIOLOGY, V308, DOI 10.1148/radiol.231701
[3]   Exploring the Clinical Translation of Generative Models Like ChatGPT: Promise and Pitfalls in Radiology, From Patients to Population Health [J].
Doo, Florence X. ;
Cook, Tessa S. ;
Siegel, Eliot L. ;
Joshi, Anupam ;
Parekh, Vishwa ;
Elahi, Ameena ;
Yi, Paul H. .
JOURNAL OF THE AMERICAN COLLEGE OF RADIOLOGY, 2023, 20 (09) :877-885
[4]  
Gertz RJ, 2023, RADIOLOGY, V307
[5]   Evaluating the performance of Generative Pre-trained Transformer-4 (GPT-4) in standardizing radiology reports [J].
Hasani, Amir M. ;
Singh, Shiva ;
Zahergivar, Aryan ;
Ryan, Beth ;
Nethala, Daniel ;
Bravomontenegro, Gabriela ;
Mendhiratta, Neil ;
Ball, Mark ;
Farhadi, Faraz ;
Malayeri, Ashkan .
EUROPEAN RADIOLOGY, 2024, 34 (06) :3566-3574
[6]   Health system-scale language models are all-purpose prediction engines [J].
Jiang, Lavender Yao ;
Liu, Xujin Chris ;
Nejatian, Nima Pour ;
Nasir-Moin, Mustafa ;
Wang, Duo ;
Abidin, Anas ;
Eaton, Kevin ;
Riina, Howard Antony ;
Laufer, Ilya ;
Punjabi, Paawan ;
Miceli, Madeline ;
Kim, Nora C. ;
Orillac, Cordelia ;
Schnurman, Zane ;
Livia, Christopher ;
Weiss, Hannah ;
Kurland, David ;
Neifert, Sean ;
Dastagirzada, Yosef ;
Kondziolka, Douglas ;
Cheung, Alexander T. M. ;
Yang, Grace ;
Cao, Ming ;
Flores, Mona ;
Costa, Anthony B. ;
Aphinyanaphongs, Yindalon ;
Cho, Kyunghyun ;
Oermann, Eric Karl .
NATURE, 2023, 619 (7969) :357-+
[7]   Deep learning and medical imaging [J].
Klang, Eyal .
JOURNAL OF THORACIC DISEASE, 2018, 10 (03) :1325-1328
[8]   Population-wide evaluation of artificial intelligence and radiologist assessment of screening mammograms [J].
Kuhl, Johanne ;
Elhakim, Mohammad Talal ;
Stougaard, Sarah Wordenskjold ;
Rasmussen, Benjamin Schnack Brandt ;
Nielsen, Mads ;
Gerke, Oke ;
Larsen, Lisbet Bronsro ;
Graumann, Ole .
EUROPEAN RADIOLOGY, 2024, 34 (06) :3935-3946
[9]   Added value of an artificial intelligence algorithm in reducing the number of missed incidental acute pulmonary embolism in routine portal venous phase chest CT [J].
Langius-Wiffen, Eline ;
de Jong, Pim A. ;
Hoesein, Firdaus Mohamed A. ;
Dekker, Lisette ;
Van den Hoven, Andor F. ;
Nijholt, Ingrid M. ;
Boomsma, Martijn F. ;
Veldhuis, Wouter B. .
EUROPEAN RADIOLOGY, 2024, 34 (01) :367-373
[10]   The Future of AI and Informatics in Radiology: 10 Predictions [J].
Langlotz, Curtis P. .
RADIOLOGY, 2023, 309 (01)