How do large language models answer breast cancer quiz questions? A comparative study of GPT-3.5, GPT-4 and Google Gemini

被引:4
作者
Irmici, Giovanni [1 ]
Cozzi, Andrea [2 ]
Della Pepa, Gianmarco [1 ]
De Berardinis, Claudia [1 ]
D'Ascoli, Elisa [1 ]
Cellina, Michaela [3 ]
Ce, Maurizio [4 ]
Depretto, Catherine [1 ]
Scaperrotta, Gianfranco [1 ]
机构
[1] Fdn IRCCS Ist Nazl Tumori, Breast Radiol Dept, Via Giacomo Venezian 1, I-20133 Milan, Italy
[2] Ente Osped Cantonale EOC, Imaging Inst Southern Switzerland IIMSI, Lugano, Switzerland
[3] ASST Fatebenefratelli Sacco, Radiol Dept, Milan, Italy
[4] Univ Milan, Postgrad Sch Radiodiagnost, Milan, Italy
来源
RADIOLOGIA MEDICA | 2024年 / 129卷 / 10期
关键词
Large language models; ChatGPT; Google Gemini; Breast cancer;
D O I
10.1007/s11547-024-01872-1
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Applications of large language models (LLMs) in the healthcare field have shown promising results in processing and summarizing multidisciplinary information. This study evaluated the ability of three publicly available LLMs (GPT-3.5, GPT-4, and Google Gemini-then called Bard) to answer 60 multiple-choice questions (29 sourced from public databases, 31 newly formulated by experienced breast radiologists) about different aspects of breast cancer care: treatment and prognosis, diagnostic and interventional techniques, imaging interpretation, and pathology. Overall, the rate of correct answers significantly differed among LLMs (p = 0.010): the best performance was achieved by GPT-4 (95%, 57/60) followed by GPT-3.5 (90%, 54/60) and Google Gemini (80%, 48/60). Across all LLMs, no significant differences were observed in the rates of correct replies to questions sourced from public databases and newly formulated ones (p >= 0.593). These results highlight the potential benefits of LLMs in breast cancer care, which will need to be further refined through in-context training.
引用
收藏
页码:1463 / 1467
页数:5
相关论文
共 15 条
  • [1] Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments
    Brin, Dana
    Sorin, Vera
    Vaid, Akhil
    Soroush, Ali
    Glicksberg, Benjamin S.
    Charney, Alexander W.
    Nadkarni, Girish
    Klang, Eyal
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01)
  • [2] The future landscape of large language models in medicine
    Clusmann, Jan
    Kolbinger, Fiona R.
    Muti, Hannah Sophie
    Carrero, Zunamys I.
    Eckardt, Jan-Niklas
    Laleh, Narmin Ghaffari
    Loeffler, Chiara Maria Lavinia
    Schwarzkopf, Sophie-Caroline
    Unger, Michaela
    Veldhuizen, Gregory P.
    Wagner, Sophia J.
    Kather, Jakob Nikolas
    [J]. COMMUNICATIONS MEDICINE, 2023, 3 (01):
  • [3] BI-RADS Category Assignments by GPT-3.5, GPT-4, and Google Bard: A Multilanguage Study
    Cozzi, Andrea
    Pinker, Katja
    Hidber, Andri
    Zhang, Tianyu
    Bonomo, Luca
    Lo Gullo, Roberto
    Christianson, Blake
    Curti, Marco
    Rizzo, Stefania
    Del Grande, Filippo
    Mann, Ritse M.
    Schiaffino, Simone
    [J]. RADIOLOGY, 2024, 311 (01)
  • [4] Evolution of publicly available large language models for complex decision-making in breast cancer care
    Griewing, Sebastian
    Knitza, Johannes
    Boekhoff, Jelena
    Hillen, Christoph
    Lechner, Fabian
    Wagner, Uwe
    Wallwiener, Markus
    Kuhn, Sebastian
    [J]. ARCHIVES OF GYNECOLOGY AND OBSTETRICS, 2024, 310 (01) : 537 - 550
  • [5] Evaluating large language models on a highly-specialized topic, radiation oncology physics
    Holmes, Jason
    Liu, Zhengliang
    Zhang, Lian
    Ding, Yuzhen
    Sio, Terence T.
    McGee, Lisa A.
    Ashman, Jonathan B.
    Li, Xiang
    Liu, Tianming
    Shen, Jiajian
    Liu, Wei
    [J]. FRONTIERS IN ONCOLOGY, 2023, 13
  • [6] Is ChatGPT accurate and reliable in answering questions regarding head and neck cancer?
    Kuscu, Oguz
    Pamuk, A. Erim
    Suslu, Nilda Sutay
    Hosal, Sefik
    [J]. FRONTIERS IN ONCOLOGY, 2023, 13
  • [7] Foundation models for generalist medical artificial intelligence
    Moor, Michael
    Banerjee, Oishi
    Abad, Zahra Shakeri Hossein
    Krumholz, Harlan M.
    Leskovec, Jure
    Topol, Eric J.
    Rajpurkar, Pranav
    [J]. NATURE, 2023, 616 (7956) : 259 - 265
  • [8] Transformers and large language models in healthcare: A review
    Nerella, Subhash
    Bandyopadhyay, Sabyasachi
    Zhang, Jiaqing
    Contreras, Miguel
    Siegel, Scott
    Bumin, Aysegul
    Silva, Brandon
    Sena, Jessica
    Shickel, Benjamin
    Bihorac, Azra
    Khezeli, Kia
    Rashidi, Parisa
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2024, 154
  • [9] Large Language Models in Medicine: The Potentials and Pitfalls A Narrative Review
    Omiye, Jesutofunmi A.
    Gui, Haiwen
    Rezaei, Shawheen J.
    Zou, James
    Daneshjou, Roxana
    [J]. ANNALS OF INTERNAL MEDICINE, 2024, 177 (02) : 210 - 220
  • [10] Rahsepar AA, 2023, RADIOLOGY, V307, DOI 10.1148/radiol.230922