GPT-4 Artificial Intelligence Model Outperforms ChatGPT, Medical Students, and Neurosurgery Residents on Neurosurgery Written Board-Like Questions

被引:46
作者
Guerra, Gage A. [1 ]
Hofmann, Hayden [1 ]
Sobhani, Sina [1 ]
Hofmann, Grady [2 ]
Gomez, David [1 ]
Soroudi, Daniel [3 ]
Hopkins, Benjamin S. [1 ]
Dallas, Jonathan [1 ]
Pangal, Dhiraj J. [1 ]
Cheok, Stephanie [1 ]
Nguyen, Vincent N. [1 ]
Mack, William J. [1 ]
Zada, Gabriel [1 ]
机构
[1] Univ Southern Calif, Dept Neurosurg, Los Angeles, CA 90007 USA
[2] Stanford Univ, Dept Biol, Palo Alto, CA USA
[3] Univ Calif San Francisco, Sch Med, San Francisco, CA USA
关键词
Artificial intelligence; ChatGPT; GPT-4; Machine learning; Neurosurgical boards; Neurosurgical training; SANS question;
D O I
10.1016/j.wneu.2023.08.042
中图分类号
R74 [神经病学与精神病学];
学科分类号
摘要
-BACKGROUND: Artificial intelligence (AI) and machine learning have transformed health care with applications in various specialized fields. Neurosurgery can benefit from artificial intelligence in surgical planning, predicting patient outcomes, and analyzing neuroimaging data. GPT-4, an -pdated language model with additional training parameters, has exhibited exceptional performance on standardized exams. This study examines GPT-4's competence on neurosurgical board-style questions, comparing its performance with medical students and residents, to explore its potential in medical education and clinical decision-making.-METHODS: GPT-4's performance was examined on 643 Congress of Neurological Surgeons Self-Assessment Neurosurgery Exam (SANS) board-style questions from various neurosurgery subspecialties. Of these, 477 were text-based and 166 contained images. GPT-4 refused to answer 52 questions that contained no text. The remaining 591 questions were inputted into GPT-4, and its performance was evaluated based on first-time responses. Raw scores were analyzed across subspecialties and question types, and then compared to previous findings on Chat Generative pre-trained transformer performance against SANS users, medical students, and neurosurgery residents.-RESULTS: GPT-4 attempted 91.9% of Congress of Neurological Surgeons SANS questions and achieved 76.6% accuracy. The model's accuracy increased to 79.0% for text-only questions. GPT-4 outperformed Chat Generative pre-trained transformer (P < 0.001) and scored highest in pain/peripheral nerve (84%) and lowest in spine (73%) categories. It exceeded the performance of medical students (26.3%), neurosurgery residents (61.5%), and the national average of SANS users (69.3%) across all categories.-CONCLUSIONS: GPT-4 significantly outperformed medical students, neurosurgery residents, and the national average of SANS users. The mode's accuracy suggests potential applications in educational settings and clinical decision-making, enhancing provider efficiency, and improving patient care.
引用
收藏
页码:E160 / E165
页数:6
相关论文
共 17 条
  • [1] Artificial Intelligence in Cancer Research and Precision Medicine
    Bhinder, Bhavneet
    Gilvary, Coryandar
    Madhukar, Neel S.
    Elemento, Olivier
    [J]. CANCER DISCOVERY, 2021, 11 (04) : 900 - 915
  • [2] Intraoperative thermal infrared imaging in neurosurgery: machine learning approaches for advanced segmentation of tumors
    Cardone, Daniela
    Trevisi, Gianluca
    Perpetuini, David
    Filippini, Chiara
    Merla, Arcangelo
    Mangiola, Annunziato
    [J]. PHYSICAL AND ENGINEERING SCIENCES IN MEDICINE, 2023, 46 (01) : 325 - 337
  • [3] A Systematic Review on Machine Learning in Neurosurgery: The Future of Decision-Making in Patient Care
    Celtikci, Emrah
    [J]. TURKISH NEUROSURGERY, 2018, 28 (02) : 167 - 173
  • [4] Machine Learning and Artificial Intelligence in Neurosurgery: Status, Prospects, and Challenges
    Dagi, T. Forcht
    Barker, Fred G., II
    Glass, Jacob
    [J]. NEUROSURGERY, 2021, 89 (02) : 133 - 142
  • [5] Machine Learning-Based Surgical Planning for Neurosurgery: Artificial Intelligent Approaches to the Cranium
    Dundar, Tolga Turan
    Yurtsever, Ismail
    Pehlivanoglu, Meltem Kurt
    Yildiz, Ugur
    Eker, Aysegul
    Demir, Mehmet Ali
    Mutluer, Ahmet Serdar
    Tektas, Recep
    Kazan, Mevlude Sila
    Kitis, Serkan
    Gokoglu, Abdulkerim
    Dogan, Ihsan
    Duru, Nevcihan
    [J]. FRONTIERS IN SURGERY, 2022, 9
  • [6] Gilson Aidan, 2023, JMIR Med Educ, V9, pe45312, DOI 10.2196/45312
  • [7] Artificial Intelligence in Anesthesiology Current Techniques, Clinical Applications, and Limitations
    Hashimoto, Daniel A.
    Witkowski, Elan
    Gao, Lei
    Meireles, Ozanan
    Rosman, Guy
    [J]. ANESTHESIOLOGY, 2020, 132 (02) : 379 - 394
  • [8] Mass Deployment of Deep Neural Network: Real-Time Proof of Concept With Screening of Intracranial Hemorrhage Using an Open Data Set
    Hopkins, Benjamin S.
    Murthy, Nikhil K.
    Texakalidis, Pavlos
    Karras, Constantine L.
    Mansell, Mitchell
    Jahromi, Babak S.
    Potts, Matthew B.
    Dahdaleh, Nader S.
    [J]. NEUROSURGERY, 2022, 90 (04) : 383 - 389
  • [9] ChatGPT versus the neurosurgical written boards: a comparative analysis of artificial intelligence/machine learning performance on neurosurgical board-style questions
    Hopkins, Benjamin S.
    Nguyen, Vincent N.
    Dallas, Jonathan
    Texakalidis, Pavlos
    Yang, Max
    Renn, Alex
    Guerra, Gage
    Kashif, Zain
    Cheok, Stephanie
    Zada, Gabriel
    Mack, William J.
    [J]. JOURNAL OF NEUROSURGERY, 2023, 139 (03) : 904 - 911
  • [10] Machine Learning for the Prediction of Cervical Spondylotic Myelopathy: A Post Hoc Pilot Study of 28 Participants
    Hopkins, Benjamin S.
    Weber, Kenneth A., II
    Kesavabhotla, Kartik
    Paliwal, Monica
    Cantrell, Donald R.
    Smith, Zachary A.
    [J]. WORLD NEUROSURGERY, 2019, 127 : E436 - E442