Evaluating ChatGPT-4's Performance in Identifying Radiological Anatomy in FRCR Part 1 Examination Questions

被引:0
作者
Sarangi, Pradosh Kumar [1 ]
Datta, Suvrankar [2 ]
Panda, Braja Behari [3 ]
Panda, Swaha [4 ]
Mondal, Himel [5 ]
机构
[1] All India Inst Med Sci, Dept Radiodiag, Deoghar 814152, Jharkhand, India
[2] All India Inst Med Sci, Dept Radiodiag, New Delhi, India
[3] Veer Surendra Sai Inst Med Sci & Res, Dept Radiodiag, Burla, Odisha, India
[4] All India Inst Med Sci, Dept Otorhinolaryngol & Head Neck Surg, Deoghar, Jharkhand, India
[5] All India Inst Med Sci, Dept Physiol, Deoghar, Jharkhand, India
关键词
artificial intelligence; ChatGPT-4; large language model; radiology; FRCR; anatomy; fellowship; GPT-4;
D O I
10.1055/s-0044-1792040
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Background Radiology is critical for diagnosis and patient care, relying heavily on accurate image interpretation. Recent advancements in artificial intelligence (AI) and natural language processing (NLP) have raised interest in the potential of AI models to support radiologists, although robust research on AI performance in this field is still emerging. Objective This study aimed to assess the efficacy of ChatGPT-4 in answering radiological anatomy questions similar to those in the Fellowship of the Royal College of Radiologists (FRCR) Part 1 Anatomy examination. Methods We used 100 mock radiological anatomy questions from a free Web site patterned after the FRCR Part 1 Anatomy examination. ChatGPT-4 was tested under two conditions: with and without context regarding the examination instructions and question format. The main query posed was: "Identify the structure indicated by the arrow(s)." Responses were evaluated against correct answers, and two expert radiologists (>5 and 30 years of experience in radiology diagnostics and academics) rated the explanation of the answers. We calculated four scores: correctness, sidedness, modality identification, and approximation. The latter considers partial correctness if the identified structure is present but not the focus of the question. Results Both testing conditions saw ChatGPT-4 underperform, with correctness scores of 4 and 7.5% for no context and with context, respectively. However, it identified the imaging modality with 100% accuracy. The model scored over 50% on the approximation metric, where it identified present structures not indicated by the arrow. However, it struggled with identifying the correct side of the structure, scoring approximately 42 and 40% in the no context and with context settings, respectively. Only 32% of the responses were similar across the two settings. Conclusion Despite its ability to correctly recognize the imaging modality, ChatGPT-4 has significant limitations in interpreting normal radiological anatomy. This indicates the necessity for enhanced training in normal anatomy to better interpret abnormal radiological images. Identifying the correct side of structures in radiological images also remains a challenge for ChatGPT-4.
引用
收藏
页数:8
相关论文
共 34 条
  • [1] Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation
    Gobira, Mauro
    Nakayama, Luis Filipe
    Moreira, Rodrigo
    Andrade, Eric
    Regatieri, Caio Vinicius Saito
    Belfort Jr, Rubens
    REVISTA DA ASSOCIACAO MEDICA BRASILEIRA, 2023, 69 (10):
  • [2] Evaluating ChatGPT-4's Accuracy in Identifying Final Diagnoses Within Differential Diagnoses Compared With Those of Physicians: Experimental Study for Diagnostic Cases
    Hirosawa, Takanobu
    Harada, Yukinori
    Mizuta, Kazuya
    Sakamoto, Tetsu
    Tokumasu, Kazuki
    Shimizu, Taro
    JMIR FORMATIVE RESEARCH, 2024, 8
  • [3] Performance of ChatGPT-3.5 and ChatGPT-4 in the Taiwan National Pharmacist Licensing Examination: Comparative Evaluation Study
    Wang, Ying-Mei
    Shen, Hung-Wei
    Chen, Tzeng-Ji
    Chiang, Shu-Chiung
    Lin, Ting-Guan
    JMIR MEDICAL EDUCATION, 2025, 11
  • [4] Evaluating ChatGPT-4's performance on oral and maxillofacial queries: Chain of Thought and standard method
    Ji, Kaiyuan
    Wu, Zhihan
    Han, Jing
    Zhai, Guangtao
    Liu, Jiannan
    FRONTIERS IN ORAL HEALTH, 2025, 6
  • [5] The performance of ChatGPT-4 and Bing Chat in frequently asked questions about glaucoma
    Dogan, Levent
    Yilmaz, Ibrahim Edhem
    EUROPEAN JOURNAL OF OPHTHALMOLOGY, 2025,
  • [6] Evaluating the performance of ChatGPT-4 on the United Kingdom Medical Licensing Assessment
    Lai, U. Hin
    Wu, Keng Sam
    Hsu, Ting-Yu
    Kan, Jessie Kai Ching
    FRONTIERS IN MEDICINE, 2023, 10
  • [7] Could ChatGPT-4 pass an anaesthesiology board examination? Follow-up assessment of a comprehensive set of board examination practice questions
    Shay, Denys
    Kumar, Bhawesh
    Redaelli, Simone
    von Wedel, Dario
    Liu, Manqing
    Dershwitz, Mark
    Schaefer, Maximilian S.
    Beam, Andrew
    BRITISH JOURNAL OF ANAESTHESIA, 2024, 132 (01) : 172 - 174
  • [8] Evaluating chatGPT-4 and chatGPT-4o: performance insights from NAEP mathematics problem solving
    Wei, Xin
    FRONTIERS IN EDUCATION, 2024, 9
  • [9] Evaluating ChatGPT-4's Diagnostic Accuracy: Impact of Visual Data Integration
    Hirosawa, Takanobu
    Harada, Yukinori
    Tokumasu, Kazuki
    Ito, Takahiro
    Suzuki, Tomoharu
    Shimizu, Taro
    JMIR MEDICAL INFORMATICS, 2024, 12
  • [10] Evaluation of ChatGPT-4 Performance in Answering Patients' Questions About the Management of Type 2 Diabetes
    Gokbulut, Puren
    Kuskonmaz, Serife Mehlika
    Onder, Cagatay Emir
    Taskaldiran, Isilay
    Koc, Gonul
    MEDICAL BULLETIN OF SISLI ETFAL HOSPITAL, 2024, 58 (04): : 483 - 490