Artificial intelligence in orthopaedics: can Chat Generative Pre-trained Transformer (ChatGPT) pass Section 1 of the Fellowship of the Royal College of Surgeons (Trauma & Orthopaedics) examination?

被引:31
作者
Cuthbert, Rory [1 ]
Simpson, Ashley, I [1 ]
机构
[1] Guys & St Thomas Hosp Natl Hlth Serv Fdn Trust, London SE1 9RT, England
关键词
artificial intelligence; trauma; orthopaedics; surgeons; examination;
D O I
10.1093/postmj/qgad053
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Purpose Chat Generative Pre-trained Transformer (ChatGPT) is a large language artificial intelligence (AI) model which generates contextually relevant text in response to questioning. After ChatGPT successfully passed the United States Medical Licensing Examinations, proponents have argued it should play an increasing role in medical service provision and education. AI in healthcare remains in its infancy, and the reliability of AI systems must be scrutinized. This study assessed whether ChatGPT could pass Section 1 of the Fellowship of the Royal College of Surgeons (FRCS) examination in Trauma and Orthopaedic Surgery. Methods The UK and Ireland In-Training Examination (UKITE) was used as a surrogate for the FRCS. Papers 1 and 2 of UKITE 2022 were directly inputted into ChatGPT. All questions were in a single-best-answer format without wording alterations. Imaging was trialled to ensure ChatGPT utilized this information. Results ChatGPT scored 35.8%: 30% lower than the FRCS pass rate and 8.2% lower than the mean score achieved by human candidates of all training levels. Subspecialty analysis demonstrated ChatGPT scored highest in basic science (53.3%) and lowest in trauma (0%). In 87 questions answered incorrectly, ChatGPT only stated it did not know the answer once and gave incorrect explanatory answers for the remaining questions. Conclusion ChatGPT is currently unable to exert the higher-order judgement and multilogical thinking required to pass the FRCS examination. Further, the current model fails to recognize its own limitations. ChatGPT's deficiencies should be publicized equally as much as its successes to ensure clinicians remain aware of its fallibility. Key messages What is already known on this topic Following ChatGPT's much-publicized success in passing the United States Medical Licensing Examinations, clinicians and medical students are using the model increasingly frequently for medical service provision and education. However ChatGPT remains in its infancy, and the model's reliability and accuracy remain unproven. What this study adds This study demonstrates ChatGPT is currently unable to exert the higher-order judgement and multilogical thinking required to pass the Fellowship of the Royal College of Surgeons (FRCS) (Trauma & Orthopaedics) examination. Further, the current model fails to recognize its own limitations when offering both direct and explanatory answers. How this study might affect research, practice, or policy This study highlights the need for medical students and clinicians to exert caution when employing ChatGPT as a revision tool or applying it in clinical practice, and for patients to be aware of its fallibilities when using it as a health resource. Future research questions include: How can the reliability of ChatGPT's responses be regulated moving forward? Do postgraduate examinations adequately focus on higher-order judgement and multilogical thinking rather than simple fact recall? Will future models of ChatGPT develop the higher-order judgement to pass the FRCS (Trauma & Orthopaedics) examination?
引用
收藏
页码:1110 / 1114
页数:5
相关论文
共 18 条
[1]   Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum [J].
Ayers, John W. ;
Poliak, Adam ;
Dredze, Mark ;
Leas, Eric C. ;
Zhu, Zechariah ;
Kelley, Jessica B. ;
Faix, Dennis J. ;
Goodman, Aaron M. ;
Longhurst, Christopher A. ;
Hogarth, Michael ;
Smith, Davey M. .
JAMA INTERNAL MEDICINE, 2023, 183 (06) :589-596
[2]  
British Orthopaedic Association, 2023, UK IR IN TRAIN EX UK
[3]   ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health [J].
De Angelis, Luigi ;
Baglivo, Francesco ;
Arzilli, Guglielmo ;
Privitera, Gaetano Pierpaolo ;
Ferragina, Paolo ;
Tozzi, Alberto Eugenio ;
Rizzo, Caterina .
FRONTIERS IN PUBLIC HEALTH, 2023, 11
[4]  
Featherstone C., 2014, SBA WRITING GUIDELIN
[5]   Use of advanced neuroimaging and artificial intelligence in meningiomas [J].
Galldiks, Norbert ;
Angenstein, Frank ;
Werner, Jan-Michael ;
Bauer, Elena K. ;
Gutsche, Robin ;
Fink, Gereon R. ;
Langen, Karl-Josef ;
Lohmann, Philipp .
BRAIN PATHOLOGY, 2022, 32 (02)
[6]   Artificial intelligence to support clinical decision-making processes [J].
Garcia-Vidal, Carolina ;
Sanjuan, Gemma ;
Puerta-Alcalde, Pedro ;
Moreno-Garcia, Estela ;
Soriano, Alex .
EBIOMEDICINE, 2019, 46 :27-29
[7]  
Google, 2023, Bard
[8]  
Joint Committee on Intercollegiate Examinations (JCIE), 2023, EX RES
[9]  
Joint Committee on Intercollegiate Examinations (JCIE), 2023, INT SPEC EX TRAUM OR
[10]   Artificial Intelligence in Precision Cardiovascular Medicine [J].
Krittanawong, Chayakrit ;
Zhang, HongJu ;
Wang, Zhen ;
Aydar, Mehmet ;
Kitai, Takeshi .
JOURNAL OF THE AMERICAN COLLEGE OF CARDIOLOGY, 2017, 69 (21) :2657-2664