Evaluation of a Large Language Model's Ability to Assist in an Orthopedic Hand Clinic

被引:0
|
作者
Kotzur, Travis [1 ,2 ]
Singh, Aaron [1 ]
Parker, John [1 ]
Peterson, Blaire [1 ]
Sager, Brian [1 ]
Rose, Ryan [1 ]
Corley, Fred [1 ]
Brady, Christina [1 ]
机构
[1] UT Hlth San Antonio, San Antonio, TX USA
[2] UT Hlth San Antonio, Dept Orthopaed, 7703 Floyd Curl Dr,MC-7774, San Antonio, TX 78229 USA
来源
HAND-AMERICAN ASSOCIATION FOR HAND SURGERY | 2024年
关键词
artificial intelligence; ChatGPT; GPT-4; large language model; machine learning; hand surgery; orthopedics; SAGITTAL BAND;
D O I
10.1177/15589447241257643
中图分类号
R826.8 [整形外科学]; R782.2 [口腔颌面部整形外科学]; R726.2 [小儿整形外科学]; R62 [整形外科学(修复外科学)];
学科分类号
摘要
Background: Advancements in artificial intelligence technology, such as OpenAI's large language model, ChatGPT, could transform medicine through applications in a clinical setting. This study aimed to assess the utility of ChatGPT as a clinical assistant in an orthopedic hand clinic.Methods: Nine clinical vignettes, describing various common and uncommon hand pathologies, were constructed and reviewed by 4 fellowship-trained orthopedic hand surgeons and an orthopedic resident. ChatGPT was given these vignettes and asked to generate a differential diagnosis, potential workup plan, and provide treatment options for its top differential. Responses were graded for accuracy and the overall utility scored on a 5-point Likert scale.Results: The diagnostic accuracy of ChatGPT was 7 out of 9 cases, indicating an overall accuracy rate of 78%. ChatGPT was less reliable with more complex pathologies and failed to identify an intentionally incorrect presentation. ChatGPT received a score of 3.8 +/- 1.4 for correct diagnosis, 3.4 +/- 1.4 for helpfulness in guiding patient management, 4.1 +/- 1.0 for appropriate workup for the actual diagnosis, 4.3 +/- 0.8 for an appropriate recommended treatment plan for the diagnosis, and 4.4 +/- 0.8 for the helpfulness of treatment options in managing patients.Conclusion: ChatGPT was successful in diagnosing most of the conditions; however, the overall utility of its advice was variable. While it performed well in recommending treatments, it faced difficulties in providing appropriate diagnoses for uncommon pathologies. In addition, it failed to identify an obvious error in presenting pathology.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Large language model doctor: assessing the ability of ChatGPT-4 to deliver interventional radiology procedural information to patients during the consent process
    Hofmann, Hayden L.
    Vairavamurthy, Jenanan
    CVIR ENDOVASCULAR, 2024, 7 (01)
  • [32] Development and Performance of a Large Language Model for the Quality Evaluation of Multi-Language Medical Imaging Guidelines and Consensus
    Wang, Zhixiang
    Sun, Jing
    Liu, Hui
    Luo, Xufei
    Li, Jia
    He, Wenjun
    Yang, Zhenhua
    Lv, Han
    Chen, Yaolong
    Wang, Zhenchang
    JOURNAL OF EVIDENCE BASED MEDICINE, 2025, 18 (02)
  • [33] Harnessing generative AI in chemical engineering education: Implementation and evaluation of the large language model ChatGPT v3.5
    Keith, Matthew
    Keiller, Eleanor
    Windows-Yule, Christopher
    Kings, Iain
    Robbins, Phillip
    EDUCATION FOR CHEMICAL ENGINEERS, 2025, 51 : 20 - 33
  • [34] Evaluating a large language model's accuracy in chest X-ray interpretation for acute thoracic conditions
    Ostrovsky, Adam M.
    AMERICAN JOURNAL OF EMERGENCY MEDICINE, 2025, 93 : 99 - 102
  • [35] PBChat: Enhance Student's Problem Behavior Diagnosis with Large Language Model
    Chen, Penghe
    Fan, Zhilin
    Lu, Yu
    Xu, Qi
    ARTIFICIAL INTELLIGENCE IN EDUCATION, PT I, AIED 2024, 2024, 14829 : 32 - 45
  • [36] A comparative study of rule-based, machine learning and large language model approaches in automated writing evaluation (AWE)
    Yeung, Steven
    FIFTEENTH INTERNATIONAL CONFERENCE ON LEARNING ANALYTICS & KNOWLEDGE, LAK 2025, 2025, : 984 - 991
  • [37] Evaluation of responses to cardiac imaging questions by the artificial intelligence large language model ChatGPT
    Monroe, Cynthia L.
    Abdelhafez, Yasser G.
    Atsina, Kwame
    Aman, Edris
    Nardo, Lorenzo
    Madani, Mohammad H.
    CLINICAL IMAGING, 2024, 112
  • [38] MindLLM: Lightweight large language model pre-training, evaluation and domain application
    Yang, Yizhe
    Sun, Huashan
    Li, Jiawei
    Liu, Runheng
    Li, Yinghao
    Liu, Yuhang
    Gao, Yang
    Huang, Heyan
    AI OPEN, 2024, 5 : 155 - 180
  • [39] Evaluation of prompt engineering strategies for pharmacokinetic data analysis with the ChatGPT large language model
    Euibeom Shin
    Murali Ramanathan
    Journal of Pharmacokinetics and Pharmacodynamics, 2024, 51 : 101 - 108
  • [40] Evaluation of prompt engineering strategies for pharmacokinetic data analysis with the ChatGPT large language model
    Shin, Euibeom
    Ramanathan, Murali
    JOURNAL OF PHARMACOKINETICS AND PHARMACODYNAMICS, 2024, 51 (02) : 101 - 108