Matching Human Expertise: ChatGPT's Performance on Hand Surgery Examinations

被引：1

作者：

Kirschenbaum, Zachary A. ^{[1
]}

Han, Yuri ^{[1
]}

Vrindten, Kiera L. ^{[1
]}

Wang, Hanbin ^{[1
]}

Cody, Ron ^{[2
]}

Katt, Brian M. ^{[1
]}

Kirschenbaum, David ^{[1
]}

机构：

[1] Rutgers Robert Wood Johnson Med Sch, New Brunswick, NJ USA

[2] SAS Inst, Cary, NC USA

来源：

HAND-AMERICAN ASSOCIATION FOR HAND SURGERY | 2025年

关键词：

ChatGPT; AI; education; certification; self-assessment;

D O I：

10.1177/15589447251322914

中图分类号：

R826.8 [整形外科学]; R782.2 [口腔颌面部整形外科学]; R726.2 [小儿整形外科学]; R62 [整形外科学（修复外科学）];

学科分类号：

摘要：

Background: The integration of artificial intelligence (AI) into health care witnessed significant advancements, particularly with AI-driven tools like ChatGPT. Initial evaluations indicated that ChatGPT 3.5 did not perform as well as humans on specialized hand surgery self-assessment examinations. The purpose of this study is to evaluate the performance of ChatGPT 4o on American Society for Surgery of the Hand (ASSH) self-assessment questions and whether using enhanced techniques such as better prompts and file search improve accuracy.Methods: Using data from the ASSH self-assessment examinations (2008-2013), we explored the impact of ChatGPT model version, prompt, and file search on the accuracy of AI-generated responses. We used OpenAI's application programming interface to automate question input and response scoring. Statistical analysis was conducted using one-way analysis of variance. KR-20 was used to assess the reliability of the test.Results: Results indicate that the latest AI models, particularly ChatGPT 4o with enhanced prompting and access to peer-reviewed literature, can achieve performance levels comparable to human examinees, particularly on text-based questions. ChatGPT 4o performed significantly better than ChatGPT 3.5 and showed marked improvement with better prompts and file search capabilities. The KR-20 for the 2013 examination was 0.946, indicating a very reliable test.Conclusions: These findings highlight AI's potential to support medical education and practice, demonstrating that ChatGPT can perform at a human-equivalent level on hand surgery self-assessment examinations. Our results suggest potential utility as a supplementary tool in educational settings and as a supportive resource in clinical practice.

引用

页数：8

共 20 条

[1] The Performance of ChatGPT on the American Society for Surgery of the Hand Self-Assessment Examination [J].

Arango, Sebastian D. ;

Flynn, Jason C. ;

Zeitlin, Jacob ;

Wilson, Matthew S. ;

Strohl, Adam B. ;

Weiss, Lawrence E. ;

Weir, Tristan B. .

CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (04)

[2] Understanding Variations in the Management of Displaced Distal Radius Fractures With Satisfactory Reduction [J].

Aryee, Jomar N. A. ;

Frias, Giulia C. ;

Haddad, Daniel K. ;

Guerrero, Kevin D. ;

Chen, Vivian ;

Ling, Fan ;

Kirschenbaum, David A. ;

Monica, James T. ;

Katt, Brian M. .

HAND-AMERICAN ASSOCIATION FOR HAND SURGERY, 2025, 20 (05) :762-769

[3] Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments [J].

Brin, Dana ;

Sorin, Vera ;

Vaid, Akhil ;

Soroush, Ali ;

Glicksberg, Benjamin S. ;

Charney, Alexander W. ;

Nadkarni, Girish ;

Klang, Eyal .

SCIENTIFIC REPORTS, 2023, 13 (01)

[4]

Bross A., 2024, The Atlantic announces product and content partnership with OpenAI

[5]

Cody R., 2014, Test scoring and analysis using SAS

[6] ChatGPT Earns American Board Certification in Hand Surgery [J].

Ghanem, Diane ;

Nassar, Joseph E. ;

El Bachour, Joseph ;

Hanna, Tammam .

HAND SURGERY & REHABILITATION, 2024, 43 (03)

[7]

Han Yuri, 2024, J Hand Surg Glob Online, V6, P200, DOI 10.1016/j.jhsg.2023.11.014

[8]

Kirschenbaum Z., 2024, HandAI

[9] Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models [J].

Kung, Tiffany H. ;

Cheatham, Morgan ;

Medenilla, Arielle ;

Sillos, Czarina ;

De Leon, Lorie ;

Elepano, Camille ;

Madriaga, Maria ;

Aggabao, Rimel ;

Diaz-Candido, Giezel ;

Maningo, James ;

Tseng, Victor .

PLOS DIGITAL HEALTH, 2023, 2 (02)

[10] Comparison of ChatGPT-3.5, ChatGPT-4, and Orthopaedic Resident Performance on Orthopaedic Assessment Examinations [J].

Massey, Patrick A. ;

Montgomery, Carver ;

Zhang, Andrew S. .

JOURNAL OF THE AMERICAN ACADEMY OF ORTHOPAEDIC SURGEONS, 2023, 31 (23) :1173-1179

← 1 2 →