Comparative Evaluation of AI Models Such as ChatGPT 3.5, ChatGPT 4.0, and Google Gemini in Neuroradiology Diagnostics

被引:0
作者
Gupta, Rishi [1 ]
Hamid, Abdullgabbar M. [1 ]
Jhaveri, Miral [1 ]
Patel, Niki [2 ]
Suthar, Pokhraj P. [1 ]
机构
[1] Rush Univ, Med Ctr, Dept Diag Radiol & Nucl Med, Chicago, IL 60612 USA
[2] Kentucky Coll Osteopath Med, Dept Osteopath Med, Pikeville, KY USA
关键词
chatgpt; 3.5; neuroradiology; google gemini; 4; ai;
D O I
10.7759/cureus.67766
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Aims and objective: Advances in artificial intelligence (AI), particularly in large language models (LLMs) like ChatGPT (versions 3.5 and 4.0) and Google Gemini, are transforming healthcare. This study explores the performance of these AI models in solving diagnostic quizzes from "Neuroradiology: A Core Review" to evaluate their potential as diagnostic tools in radiology. Materials and methods: We assessed the accuracy of ChatGPT 3.5, ChatGPT 4.0, and Google Gemini using 262 multiple-choice questions covering brain, head and neck, spine, and non-interpretive skills. Each AI tool provided answers and explanations, which were compared to textbook answers. The analysis followed the STARD (Standards for Reporting of Diagnostic Accuracy Studies) guidelines, and accuracy was calculated for each AI tool and subgroup. Results: ChatGPT 4.0 achieved the highest overall accuracy at 64.89%, outperforming ChatGPT 3.5 (62.60%) and Google Gemini (55.73%). ChatGPT 4.0 excelled in brain, head, and neck diagnostics, while Google Gemini performed best in head and neck but lagged in other areas. ChatGPT 3.5 showed consistent performance across all subgroups. Conclusion: This study found that advanced AI models, including ChatGPT 4.0 and Google Gemini, vary in diagnostic accuracy, with ChatGPT 4.0 leading at 64.89% overall. While these tools are promising in improving diagnostics and medical education, their effectiveness varies by area, and Google Gemini performs unevenly across different categories. The study underscores the need for ongoing improvements and broader evaluation to address ethical concerns and optimize AI use in patient care.
引用
收藏
页数:7
相关论文
共 12 条
  • [1] Performance of Google's Artificial Intelligence Chatbot "Bard" (Now "Gemini") on Ophthalmology Board Exam Practice Questions
    Botross, Monica
    Mohammadi, Seyed Omid
    Montgomery, Kendall
    Crawford, Courtney
    [J]. CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (03)
  • [2] Dubey P, 2017, Neuroradiology: A Core Review
  • [3] Limitations of GPT-3.5 and GPT-4 in Applying Fleischner Society Guidelines to Incidental Lung Nodules
    Gamble, Joel
    Ferguson, Duncan
    Yuen, Joanna
    Sheikh, Adnan
    [J]. CANADIAN ASSOCIATION OF RADIOLOGISTS JOURNAL-JOURNAL DE L ASSOCIATION CANADIENNE DES RADIOLOGISTES, 2024, 75 (02): : 412 - 416
  • [4] Advances in natural language processing
    Hirschberg, Julia
    Manning, Christopher D.
    [J]. SCIENCE, 2015, 349 (6245) : 261 - 266
  • [5] Comparing the Diagnostic Performance of GPT-4-based ChatGPT, GPT-4V-based ChatGPT, and Radiologists in Challenging Neuroradiology Cases
    Horiuchi, Daisuke
    Tatekawa, Hiroyuki
    Oura, Tatsushi
    Oue, Satoshi
    Walston, Shannon L.
    Takita, Hirotaka
    Matsushita, Shu
    Mitsuyama, Yasuhito
    Shimono, Taro
    Miki, Yukio
    Ueda, Daiju
    [J]. CLINICAL NEURORADIOLOGY, 2024, : 779 - 787
  • [6] Google Gemini and Bard artificial intelligence chatbot performance in ophthalmology knowledge assessment
    Mihalache, Andrew
    Grad, Justin
    Patil, Nikhil S.
    Huang, Ryan S.
    Popovic, Marko M.
    Mallipatna, Ashwin
    Kertes, Peter J.
    Muni, Rajeev H.
    [J]. EYE, 2024, 38 (13) : 2530 - 2535
  • [7] Ong JCL, 2024, LANCET DIGIT HEALTH, V6, pe428, DOI [10.1016/-7500(24)00061-X, 10.1016/S2589-7500(24)00061-X]
  • [8] Performance of GPT-4 on the American College of Radiology In-training Examination: Evaluating Accuracy, Model Drift, and Fine-tuning
    Payne, David L.
    Purohit, Kush
    Borrero, Walter Morales
    Chung, Katherine
    Hao, Max
    Mpoy, Mutshipay
    Jin, Michael
    Prasanna, Prateek
    Hill, Virginia
    [J]. ACADEMIC RADIOLOGY, 2024, 31 (07) : 3046 - 3054
  • [9] Opportunities, Challenges, and Future Directions of Generative Artificial Intelligence in Medical Education: Scoping Review
    Preiksaitis, Carl
    Rose, Christian
    [J]. JMIR MEDICAL EDUCATION, 2023, 9
  • [10] Evaluating GPT as an Adjunct for Radiologic Decision Making: GPT-4 Versus GPT-3.5 in a Breast Imaging Pilot
    Rao, Arya
    Kim, John
    Kamineni, Meghana
    Pang, Michael
    Lie, Winston
    Dreyer, Keith J.
    Succi, Marc D.
    [J]. JOURNAL OF THE AMERICAN COLLEGE OF RADIOLOGY, 2023, 20 (10) : 990 - 997