ChatGPT Performs at the Level of a Third-Year Orthopaedic Surgery Resident on the Orthopaedic In-Training Examination

被引：10

作者：

Ghanem, Diane ^{[1
,2
]}

Covarrubias, Oscar ^{[1
,3
]}

Raad, Micheal ^{[1
,2
]}

LaPorte, Dawn ^{[1
,2
]}

Shafiq, Babar ^{[1
,2
]}

机构：

[1] Johns Hopkins Univ Hosp, Baltimore, MD 21287 USA

[2] Johns Hopkins Univ Hosp, Dept Orthopaed Surg, Baltimore, MD 21287 USA

[3] Johns Hopkins Univ, Sch Med, Baltimore, MD 21287 USA

来源：

JBJS OPEN ACCESS | 2023年 / 8卷 / 04期

关键词：

D O I：

10.2106/JBJS.OA.23.00103

中图分类号：

R826.8 [整形外科学]; R782.2 [口腔颌面部整形外科学]; R726.2 [小儿整形外科学]; R62 [整形外科学（修复外科学）];

学科分类号：

摘要：

Introduction:Publicly available AI language models such as ChatGPT have demonstrated utility in text generation and even problem-solving when provided with clear instructions. Amidst this transformative shift, the aim of this study is to assess ChatGPT's performance on the orthopaedic surgery in-training examination (OITE).Methods:All 213 OITE 2021 web-based questions were retrieved from the AAOS-ResStudy website (https://www.aaos.org/education/examinations/ResStudy). Two independent reviewers copied and pasted the questions and response options into ChatGPT Plus (version 4.0) and recorded the generated answers. All media-containing questions were flagged and carefully examined. Twelve OITE media-containing questions that relied purely on images (clinical pictures, radiographs, MRIs, CT scans) and could not be rationalized from the clinical presentation were excluded. Cohen's Kappa coefficient was used to examine the agreement of ChatGPT-generated responses between reviewers. Descriptive statistics were used to summarize the performance (% correct) of ChatGPT Plus. The 2021 norm table was used to compare ChatGPT Plus' performance on the OITE to national orthopaedic surgery residents in that same year.Results:A total of 201 questions were evaluated by ChatGPT Plus. Excellent agreement was observed between raters for the 201 ChatGPT-generated responses, with a Cohen's Kappa coefficient of 0.947. 45.8% (92/201) were media-containing questions. ChatGPT had an average overall score of 61.2% (123/201). Its score was 64.2% (70/109) on non-media questions. When compared to the performance of all national orthopaedic surgery residents in 2021, ChatGPT Plus performed at the level of an average PGY3.Discussion:ChatGPT Plus is able to pass the OITE with an overall score of 61.2%, ranking at the level of a third-year orthopaedic surgery resident. It provided logical reasoning and justifications that may help residents improve their understanding of OITE cases and general orthopaedic principles. Further studies are still needed to examine their efficacy and impact on long-term learning and OITE/ABOS performance.

引用

页数：7

共 21 条

[1]

American Academy of Orthopaedic Surgeons, Orthopaedic In-Training Examination (OITE) Technical Report 2022

[2] Not the Last Word: ChatGPT Can't Perform Orthopaedic Surgery [J].

Bernstein, Joseph .

CLINICAL ORTHOPAEDICS AND RELATED RESEARCH, 2023, 481 (04) :651-655

[3] THE ORTHOPAEDIC FORUM What's Important: The Next Academic-ChatGPT AI? [J].

Bi, Andrew S. .

JOURNAL OF BONE AND JOINT SURGERY-AMERICAN VOLUME, 2023, 105 (11) :893-895

[4] ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health [J].

De Angelis, Luigi ;

Baglivo, Francesco ;

Arzilli, Guglielmo ;

Privitera, Gaetano Pierpaolo ;

Ferragina, Paolo ;

Tozzi, Alberto Eugenio ;

Rizzo, Caterina .

FRONTIERS IN PUBLIC HEALTH, 2023, 11

[5]

DePasse JM, 2017, ORTHOP REV, V9, P54, DOI 10.4081/or.2017.7006

[6] From human writing to artificial intelligence generated text: examining the prospects and potential threats of ChatGPT in academic writing [J].

Dergaa, Ismail ;

Chamari, Karim ;

Zmijewski, Piotr ;

Saad, Helmi Ben .

BIOLOGY OF SPORT, 2023, 40 (02) :615-622

[7] Nonhuman "Authors" and Implications for the Integrity of Scientific Publication and Medical Knowledge [J].

Flanagin, Annette ;

Bibbins-Domingo, Kirsten ;

Berkwits, Michael ;

Christiansen, Stacy L. .

JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2023, 329 (08) :637-639

[8] Do Orthopaedic In-Training Examination Scores Predict the Likelihood of Passing the American Board of Orthopaedic Surgery Part I Examination? An Update With 2014 to 2018 Data [J].

Fritz, Erik ;

Bednar, Michael ;

Harrast, John ;

Marsh, J. Lawrence ;

Martin, David ;

Swanson, David ;

Tornetta, Paul ;

Van Heest, Ann .

JOURNAL OF THE AMERICAN ACADEMY OF ORTHOPAEDIC SURGEONS, 2021, 29 (24) :E1370-E1377

[9] How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment [J].

Gilson, Aidan ;

Safranek, Conrad W. ;

Huang, Thomas ;

Socrates, Vimig ;

Chi, Ling ;

Taylor, Richard Andrew ;

Chartash, David .

JMIR MEDICAL EDUCATION, 2023, 9

[10] ChatGPT Is Equivalent to First-Year Plastic Surgery Residents: Evaluation of ChatGPT on the Plastic Surgery In-service Examination [J].

Humar, Pooja ;

Asaad, Malke ;

Bengur, Fuat Baris ;

Nguyen, Vu .

AESTHETIC SURGERY JOURNAL, 2023, 43 (12) :NP1085-NP1089

← 1 2 3 →