Evaluation of a Large Language Model's Ability to Assist in an Orthopedic Hand Clinic

被引:0
|
作者
Kotzur, Travis [1 ,2 ]
Singh, Aaron [1 ]
Parker, John [1 ]
Peterson, Blaire [1 ]
Sager, Brian [1 ]
Rose, Ryan [1 ]
Corley, Fred [1 ]
Brady, Christina [1 ]
机构
[1] UT Hlth San Antonio, San Antonio, TX USA
[2] UT Hlth San Antonio, Dept Orthopaed, 7703 Floyd Curl Dr,MC-7774, San Antonio, TX 78229 USA
来源
HAND-AMERICAN ASSOCIATION FOR HAND SURGERY | 2024年
关键词
artificial intelligence; ChatGPT; GPT-4; large language model; machine learning; hand surgery; orthopedics; SAGITTAL BAND;
D O I
10.1177/15589447241257643
中图分类号
R826.8 [整形外科学]; R782.2 [口腔颌面部整形外科学]; R726.2 [小儿整形外科学]; R62 [整形外科学(修复外科学)];
学科分类号
摘要
Background: Advancements in artificial intelligence technology, such as OpenAI's large language model, ChatGPT, could transform medicine through applications in a clinical setting. This study aimed to assess the utility of ChatGPT as a clinical assistant in an orthopedic hand clinic.Methods: Nine clinical vignettes, describing various common and uncommon hand pathologies, were constructed and reviewed by 4 fellowship-trained orthopedic hand surgeons and an orthopedic resident. ChatGPT was given these vignettes and asked to generate a differential diagnosis, potential workup plan, and provide treatment options for its top differential. Responses were graded for accuracy and the overall utility scored on a 5-point Likert scale.Results: The diagnostic accuracy of ChatGPT was 7 out of 9 cases, indicating an overall accuracy rate of 78%. ChatGPT was less reliable with more complex pathologies and failed to identify an intentionally incorrect presentation. ChatGPT received a score of 3.8 +/- 1.4 for correct diagnosis, 3.4 +/- 1.4 for helpfulness in guiding patient management, 4.1 +/- 1.0 for appropriate workup for the actual diagnosis, 4.3 +/- 0.8 for an appropriate recommended treatment plan for the diagnosis, and 4.4 +/- 0.8 for the helpfulness of treatment options in managing patients.Conclusion: ChatGPT was successful in diagnosing most of the conditions; however, the overall utility of its advice was variable. While it performed well in recommending treatments, it faced difficulties in providing appropriate diagnoses for uncommon pathologies. In addition, it failed to identify an obvious error in presenting pathology.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Chinese Generation and Security Index Evaluation Based on Large Language Model
    Zhang, Yu
    Gao, Yongbing
    Li, Weihao
    Su, Zirong
    Yang, Lidong
    2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 151 - 161
  • [22] Evaluation of large language model responses to Mohs surgery preoperative questions
    Breneman, Alyssa
    Gordon, Emily R.
    Trager, Megan H.
    Ensslin, Courtney J.
    Fisher, Juliya
    Humphreys, Tanya R.
    Samie, Faramarz H.
    ARCHIVES OF DERMATOLOGICAL RESEARCH, 2024, 316 (06)
  • [23] Chat GPT as a Neuro-Score Calculator: Analysis of a Large Language Model's Performance on Various Neurological Exam Grading Scales
    Chen, Tse Chiang
    Kaminski, Emily
    Koduri, Laila
    Singer, Alyssa
    Singer, Jorie
    Couldwell, Mitch
    Delashaw, Johnny
    Dumont, Aaron
    Wang, Arthur
    WORLD NEUROSURGERY, 2023, 179 : E342 - E347
  • [24] Korean visual abductive reasoning: AI Language Model's ability to understand plausibility
    Han, Seonah
    Won, Jongbin
    Kwon, Eunjae
    Song, Sanghoun
    LINGUISTIC RESEARCH, 2024, 41 (02) : 283 - 310
  • [25] FEDS-ICL: Enhancing translation ability and efficiency of large language model by optimizing demonstration selection
    Zhu, Shaolin
    Pan, Leiyu
    Xiong, Deyi
    INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (05)
  • [26] Evaluation of Large Language Model Performance and Reliability for Citations and References in Scholarly Writing: Cross-Disciplinary Study
    Mugaanyi, Joseph
    Cai, Liuying
    Cheng, Sumei
    Lu, Caide
    Huang, Jing
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
  • [27] Evaluation of the integration of retrieval-augmented generation in large language model for breast cancer nursing care responses
    Xu, Ruiyu
    Hong, Ying
    Zhang, Feifei
    Xu, Hongmei
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [28] Large Language Model Evaluation Criteria Framework in Healthcare: Fuzzy MCDM Approach
    Hamzeh Mohammad Alabool
    SN Computer Science, 6 (1)
  • [29] volGPT: Evaluation on triaging ransomware process in memory forensics with Large Language Model
    Oh, Dong Bin
    Kim, Donghyun
    Kim, Dong Hyun
    Kim, Huy Kang
    FORENSIC SCIENCE INTERNATIONAL-DIGITAL INVESTIGATION, 2024, 49
  • [30] Large language model evaluation for high-performance computing software development
    Godoy, William F.
    Valero-Lara, Pedro
    Teranishi, Keita
    Balaprakash, Prasanna
    Vetter, Jeffrey S.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2024, 36 (26)