Enhancements in artificial intelligence for medical examinations: A leap from ChatGPT 3.5 to ChatGPT 4.0 in the FRCS trauma & orthopaedics examination

被引:2
作者
Khan, Akib Majed [1 ]
Sarraf, Khaled Maher [1 ]
Simpson, Ashley Iain [2 ]
机构
[1] Imperial Coll Healthcare NHS Trust, Praed St, London W2 1NY, England
[2] Royal Natl Orthopaed Hosp, Brockley Hill, Stanmore HA7 4LP, England
来源
SURGEON-JOURNAL OF THE ROYAL COLLEGES OF SURGEONS OF EDINBURGH AND IRELAND | 2025年 / 23卷 / 01期
关键词
Artificial intelligence; ChatGPT; FRCS; Trauma & orthopaedics; Medical education;
D O I
10.1016/j.surge.2024.11.008
中图分类号
R61 [外科手术学];
学科分类号
摘要
Introduction: ChatGPT is a sophisticated AI model capable of generating human-like text based on the input it receives. ChatGPT 3.5 showed an inability to pass the FRCS (Tr&Orth) examination due to a lack of higher-order judgement in previous studies. Enhancements in ChatGPT 4.0 warrant an evaluation of its performance. Methodology: Questions from the UK-based December 2022 In-Training examination were input into ChatGPT 3.5 and 4.0. Methodology from a prior study was replicated to maintain consistency, allowing for a direct comparison between the two model versions. The performance threshold remained at 65.8 %, aligning with the November 2022 sitting of Section 1 of the FRCS (Tr&Orth). Results: ChatGPT 4.0 achieved a passing score (73.9 %), indicating an improvement in its ability to analyse clinical information and make decisions reflective of a competent trauma and orthopaedic consultant. Compared to ChatGPT 4.0, version 3.5 scored 38.1 % lower, which represents a significant difference (p < 0.0001; Chisquare). The breakdown by subspecialty further demonstrated version 4.0's enhanced understanding and application in complex clinical scenarios. ChatGPT 4.0 had a significantly significant improvement in answering image-based questions (p = 0.0069) compared to its predecessor. Conclusion: ChatGPT 4.0's success in passing Section One of the FRCS (Tr&Orth) examination highlights the rapid evolution of AI technologies and their potential applications in healthcare and education.
引用
收藏
页码:13 / 17
页数:5
相关论文
共 21 条
[1]  
Brown TB, 2020, ADV NEUR IN, V33
[2]   Artificial intelligence in orthopaedics: can Chat Generative Pre-trained Transformer (ChatGPT) pass Section 1 of the Fellowship of the Royal College of Surgeons (Trauma & Orthopaedics) examination? [J].
Cuthbert, Rory ;
Simpson, Ashley, I .
POSTGRADUATE MEDICAL JOURNAL, 2023, 99 (1176) :1110-1114
[3]   On the ethics of algorithmic decision-making in healthcare [J].
Grote, Thomas ;
Berens, Philipp .
JOURNAL OF MEDICAL ETHICS, 2020, 46 (03) :205-211
[4]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[5]  
ISCP ISCP, 2024, Trauma & orthopaedic surgery curriculum 2021, P104
[6]  
JCIE JCoIE, 2024, Intercollegiate specialty examination in trauma & orthopaedic surgery-regulations, P4, Patent No. 20152023
[7]   Validating the Interpretations and Uses of Test Scores [J].
Kane, Michael T. .
JOURNAL OF EDUCATIONAL MEASUREMENT, 2013, 50 (01) :1-73
[8]  
Karpov OE, 2023, Int J Environ Res Public Health, V20
[9]   Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models [J].
Kung, Tiffany H. ;
Cheatham, Morgan ;
Medenilla, Arielle ;
Sillos, Czarina ;
De Leon, Lorie ;
Elepano, Camille ;
Madriaga, Maria ;
Aggabao, Rimel ;
Diaz-Candido, Giezel ;
Maningo, James ;
Tseng, Victor .
PLOS DIGITAL HEALTH, 2023, 2 (02)
[10]   Deep learning [J].
LeCun, Yann ;
Bengio, Yoshua ;
Hinton, Geoffrey .
NATURE, 2015, 521 (7553) :436-444