Beyond human in neurosurgical exams: ChatGPT's success in the Turkish neurosurgical society proficiency board exams

被引:14
作者
Sahin, Mustafa Caglar [1 ,7 ]
Sozer, Alperen [1 ]
Kuzucu, Pelin [1 ]
Turkmen, Tolga [2 ]
Sahin, Merve Buke [3 ]
Sozer, Ekin [4 ]
Tufek, Ozan Yavuz [1 ]
Nernekli, Kerem [5 ]
Emmez, Hakan [1 ]
Celtikci, Emrah [1 ,6 ]
机构
[1] Gazi Univ, Fac Med, Dept Neurosurg, Ankara, Turkiye
[2] Minist Hlth, Dept Neurosurg, Dortyol State Hosp, Hatay, Turkiye
[3] Ankara Prov Hlth Directorate, Minist Hlth, Dept Publ Hlth, Ankara, Turkiye
[4] Gazi Univ, Directorate Hlth Culture & Sports, Ankara, Turkiye
[5] Stanford Univ, Sch Med, Dept Radiol, Stanford, CA 94305 USA
[6] Gazi Univ, Artificial Intelligence Ctr, Ankara, Turkiye
[7] Gazi Univ, Fac Med, Dept Neurosurg, TR-06500 Ankara, Turkiye
关键词
Artificial intelligence; Board; ChatGPT; Education; Exam; Machine learning; Large language model;
D O I
10.1016/j.compbiomed.2023.107807
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Chat Generative Pre-Trained Transformer (ChatGPT) is a sophisticated natural language model that employs advanced deep learning techniques and is trained on extensive datasets to produce responses akin to human conversation for user inputs. In this study, ChatGPT's success in the Turkish Neurosurgical Society Proficiency Board Exams (TNSPBE) is compared to the actual candidates who took the exam, along with identifying the types of questions it answered incorrectly, assessing the quality of its responses, and evaluating its performance based on the difficulty level of the questions. Scores of all 260 candidates were recalculated according to the exams they took and included questions in those exams for ranking purposes of this study. The average score of the candidates for a total of 523 questions is 62.02 +/- 0.61 compared to ChatGPT, which was 78.77. We have concluded that in addition to ChatGPT's higher response rate, there was also a correlation with the increase in clarity regardless of the difficulty level of the questions with Clarity 1.5, 2.0, 2.5, and 3.0. In the participants, however, there is no such increase in parallel with the increase in clarity.
引用
收藏
页数:8
相关论文
共 36 条
[1]  
Ahmad M, 2023, Creating trustworthy llms: Dealing with hallucinations in healthcare ai
[2]   A Systematic Review on Machine Learning in Neurosurgery: The Future of Decision-Making in Patient Care [J].
Celtikci, Emrah .
TURKISH NEUROSURGERY, 2018, 28 (02) :167-173
[3]  
Chan Y H, 2003, Singapore Med J, V44, P614
[4]   ChatGPT versus human in generating medical graduate exam multiple choice questions-A multinational prospective study (Hong Kong SAR, Singapore, Ireland, and the United Kingdom) [J].
Cheung, Billy Ho Hung ;
Lau, Gary Kui Kai ;
Wong, Gordon Tin Chun ;
Lee, Elaine Yuen Phin ;
Kulkarni, Dhananjay ;
Seow, Choon Sheong ;
Wong, Ruby ;
Co, Michael Tiong-Hong .
PLOS ONE, 2023, 18 (08)
[5]  
Cohen J., 1988, STAT POWER ANAL BEHA
[6]   Machine Learning and Artificial Intelligence in Neurosurgery: Status, Prospects, and Challenges [J].
Dagi, T. Forcht ;
Barker, Fred G., II ;
Glass, Jacob .
NEUROSURGERY, 2021, 89 (02) :133-142
[7]  
Du L., 2023, ARXIV, DOI DOI 10.48550/ARXIV.2309.05217
[8]   Machine Learning-Based Surgical Planning for Neurosurgery: Artificial Intelligent Approaches to the Cranium [J].
Dundar, Tolga Turan ;
Yurtsever, Ismail ;
Pehlivanoglu, Meltem Kurt ;
Yildiz, Ugur ;
Eker, Aysegul ;
Demir, Mehmet Ali ;
Mutluer, Ahmet Serdar ;
Tektas, Recep ;
Kazan, Mevlude Sila ;
Kitis, Serkan ;
Gokoglu, Abdulkerim ;
Dogan, Ihsan ;
Duru, Nevcihan .
FRONTIERS IN SURGERY, 2022, 9
[9]   Can ChatGPT pass the life support exams without entering the American heart association course? [J].
Fijaoko, Nino ;
Gosak, Lucija ;
Stiglic, Gregor ;
Picard, Christopher T. ;
Douma, Matthew John .
RESUSCITATION, 2023, 185
[10]  
Fleiss JL., 2003, STAT METHODS RATES P, DOI 10.1002/0471445428