Matching Human Expertise: ChatGPT's Performance on Hand Surgery Examinations

被引:1
作者
Kirschenbaum, Zachary A. [1 ]
Han, Yuri [1 ]
Vrindten, Kiera L. [1 ]
Wang, Hanbin [1 ]
Cody, Ron [2 ]
Katt, Brian M. [1 ]
Kirschenbaum, David [1 ]
机构
[1] Rutgers Robert Wood Johnson Med Sch, New Brunswick, NJ USA
[2] SAS Inst, Cary, NC USA
来源
HAND-AMERICAN ASSOCIATION FOR HAND SURGERY | 2025年
关键词
ChatGPT; AI; education; certification; self-assessment;
D O I
10.1177/15589447251322914
中图分类号
R826.8 [整形外科学]; R782.2 [口腔颌面部整形外科学]; R726.2 [小儿整形外科学]; R62 [整形外科学(修复外科学)];
学科分类号
摘要
Background: The integration of artificial intelligence (AI) into health care witnessed significant advancements, particularly with AI-driven tools like ChatGPT. Initial evaluations indicated that ChatGPT 3.5 did not perform as well as humans on specialized hand surgery self-assessment examinations. The purpose of this study is to evaluate the performance of ChatGPT 4o on American Society for Surgery of the Hand (ASSH) self-assessment questions and whether using enhanced techniques such as better prompts and file search improve accuracy.Methods: Using data from the ASSH self-assessment examinations (2008-2013), we explored the impact of ChatGPT model version, prompt, and file search on the accuracy of AI-generated responses. We used OpenAI's application programming interface to automate question input and response scoring. Statistical analysis was conducted using one-way analysis of variance. KR-20 was used to assess the reliability of the test.Results: Results indicate that the latest AI models, particularly ChatGPT 4o with enhanced prompting and access to peer-reviewed literature, can achieve performance levels comparable to human examinees, particularly on text-based questions. ChatGPT 4o performed significantly better than ChatGPT 3.5 and showed marked improvement with better prompts and file search capabilities. The KR-20 for the 2013 examination was 0.946, indicating a very reliable test.Conclusions: These findings highlight AI's potential to support medical education and practice, demonstrating that ChatGPT can perform at a human-equivalent level on hand surgery self-assessment examinations. Our results suggest potential utility as a supplementary tool in educational settings and as a supportive resource in clinical practice.
引用
收藏
页数:8
相关论文
共 20 条
[1]   The Performance of ChatGPT on the American Society for Surgery of the Hand Self-Assessment Examination [J].
Arango, Sebastian D. ;
Flynn, Jason C. ;
Zeitlin, Jacob ;
Wilson, Matthew S. ;
Strohl, Adam B. ;
Weiss, Lawrence E. ;
Weir, Tristan B. .
CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (04)
[2]   Understanding Variations in the Management of Displaced Distal Radius Fractures With Satisfactory Reduction [J].
Aryee, Jomar N. A. ;
Frias, Giulia C. ;
Haddad, Daniel K. ;
Guerrero, Kevin D. ;
Chen, Vivian ;
Ling, Fan ;
Kirschenbaum, David A. ;
Monica, James T. ;
Katt, Brian M. .
HAND-AMERICAN ASSOCIATION FOR HAND SURGERY, 2025, 20 (05) :762-769
[3]   Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments [J].
Brin, Dana ;
Sorin, Vera ;
Vaid, Akhil ;
Soroush, Ali ;
Glicksberg, Benjamin S. ;
Charney, Alexander W. ;
Nadkarni, Girish ;
Klang, Eyal .
SCIENTIFIC REPORTS, 2023, 13 (01)
[4]  
Bross A., 2024, The Atlantic announces product and content partnership with OpenAI
[5]  
Cody R., 2014, Test scoring and analysis using SAS
[6]   ChatGPT Earns American Board Certification in Hand Surgery [J].
Ghanem, Diane ;
Nassar, Joseph E. ;
El Bachour, Joseph ;
Hanna, Tammam .
HAND SURGERY & REHABILITATION, 2024, 43 (03)
[7]  
Han Yuri, 2024, J Hand Surg Glob Online, V6, P200, DOI 10.1016/j.jhsg.2023.11.014
[8]  
Kirschenbaum Z., 2024, HandAI
[9]   Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models [J].
Kung, Tiffany H. ;
Cheatham, Morgan ;
Medenilla, Arielle ;
Sillos, Czarina ;
De Leon, Lorie ;
Elepano, Camille ;
Madriaga, Maria ;
Aggabao, Rimel ;
Diaz-Candido, Giezel ;
Maningo, James ;
Tseng, Victor .
PLOS DIGITAL HEALTH, 2023, 2 (02)
[10]   Comparison of ChatGPT-3.5, ChatGPT-4, and Orthopaedic Resident Performance on Orthopaedic Assessment Examinations [J].
Massey, Patrick A. ;
Montgomery, Carver ;
Zhang, Andrew S. .
JOURNAL OF THE AMERICAN ACADEMY OF ORTHOPAEDIC SURGEONS, 2023, 31 (23) :1173-1179