Enhancements in artificial intelligence for medical examinations: A leap from ChatGPT 3.5 to ChatGPT 4.0 in the FRCS trauma & orthopaedics examination

被引：2

作者：

Khan, Akib Majed ^{[1
]}

Sarraf, Khaled Maher ^{[1
]}

Simpson, Ashley Iain ^{[2
]}

机构：

[1] Imperial Coll Healthcare NHS Trust, Praed St, London W2 1NY, England

[2] Royal Natl Orthopaed Hosp, Brockley Hill, Stanmore HA7 4LP, England

来源：

SURGEON-JOURNAL OF THE ROYAL COLLEGES OF SURGEONS OF EDINBURGH AND IRELAND | 2025年 / 23卷 / 01期

关键词：

Artificial intelligence; ChatGPT; FRCS; Trauma & orthopaedics; Medical education;

D O I：

10.1016/j.surge.2024.11.008

中图分类号：

R61 [外科手术学];

学科分类号：

摘要：

Introduction: ChatGPT is a sophisticated AI model capable of generating human-like text based on the input it receives. ChatGPT 3.5 showed an inability to pass the FRCS (Tr&Orth) examination due to a lack of higher-order judgement in previous studies. Enhancements in ChatGPT 4.0 warrant an evaluation of its performance. Methodology: Questions from the UK-based December 2022 In-Training examination were input into ChatGPT 3.5 and 4.0. Methodology from a prior study was replicated to maintain consistency, allowing for a direct comparison between the two model versions. The performance threshold remained at 65.8 %, aligning with the November 2022 sitting of Section 1 of the FRCS (Tr&Orth). Results: ChatGPT 4.0 achieved a passing score (73.9 %), indicating an improvement in its ability to analyse clinical information and make decisions reflective of a competent trauma and orthopaedic consultant. Compared to ChatGPT 4.0, version 3.5 scored 38.1 % lower, which represents a significant difference (p < 0.0001; Chisquare). The breakdown by subspecialty further demonstrated version 4.0's enhanced understanding and application in complex clinical scenarios. ChatGPT 4.0 had a significantly significant improvement in answering image-based questions (p = 0.0069) compared to its predecessor. Conclusion: ChatGPT 4.0's success in passing Section One of the FRCS (Tr&Orth) examination highlights the rapid evolution of AI technologies and their potential applications in healthcare and education.

引用

页码：13 / 17

页数：5

共 21 条

[1]

Brown TB, 2020, ADV NEUR IN, V33

[2] Artificial intelligence in orthopaedics: can Chat Generative Pre-trained Transformer (ChatGPT) pass Section 1 of the Fellowship of the Royal College of Surgeons (Trauma & Orthopaedics) examination? [J].

Cuthbert, Rory ;

Simpson, Ashley, I .

POSTGRADUATE MEDICAL JOURNAL, 2023, 99 (1176) :1110-1114

[3] On the ethics of algorithmic decision-making in healthcare [J].

Grote, Thomas ;

Berens, Philipp .

JOURNAL OF MEDICAL ETHICS, 2020, 46 (03) :205-211

[4] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[5]

ISCP ISCP, 2024, Trauma & orthopaedic surgery curriculum 2021, P104

[6]

JCIE JCoIE, 2024, Intercollegiate specialty examination in trauma & orthopaedic surgery-regulations, P4, Patent No. 20152023

[7] Validating the Interpretations and Uses of Test Scores [J].

Kane, Michael T. .

JOURNAL OF EDUCATIONAL MEASUREMENT, 2013, 50 (01) :1-73

[8]

Karpov OE, 2023, Int J Environ Res Public Health, V20

[9] Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models [J].

Kung, Tiffany H. ;

Cheatham, Morgan ;

Medenilla, Arielle ;

Sillos, Czarina ;

De Leon, Lorie ;

Elepano, Camille ;

Madriaga, Maria ;

Aggabao, Rimel ;

Diaz-Candido, Giezel ;

Maningo, James ;

Tseng, Victor .

PLOS DIGITAL HEALTH, 2023, 2 (02)

[10] Deep learning [J].

LeCun, Yann ;

Bengio, Yoshua ;

Hinton, Geoffrey .

NATURE, 2015, 521 (7553) :436-444

← 1 2 3 →