Evaluation of a Large Language Model's Ability to Assist in an Orthopedic Hand Clinic

被引:0
|
作者
Kotzur, Travis [1 ,2 ]
Singh, Aaron [1 ]
Parker, John [1 ]
Peterson, Blaire [1 ]
Sager, Brian [1 ]
Rose, Ryan [1 ]
Corley, Fred [1 ]
Brady, Christina [1 ]
机构
[1] UT Hlth San Antonio, San Antonio, TX USA
[2] UT Hlth San Antonio, Dept Orthopaed, 7703 Floyd Curl Dr,MC-7774, San Antonio, TX 78229 USA
来源
HAND-AMERICAN ASSOCIATION FOR HAND SURGERY | 2024年
关键词
artificial intelligence; ChatGPT; GPT-4; large language model; machine learning; hand surgery; orthopedics; SAGITTAL BAND;
D O I
10.1177/15589447241257643
中图分类号
R826.8 [整形外科学]; R782.2 [口腔颌面部整形外科学]; R726.2 [小儿整形外科学]; R62 [整形外科学(修复外科学)];
学科分类号
摘要
Background: Advancements in artificial intelligence technology, such as OpenAI's large language model, ChatGPT, could transform medicine through applications in a clinical setting. This study aimed to assess the utility of ChatGPT as a clinical assistant in an orthopedic hand clinic.Methods: Nine clinical vignettes, describing various common and uncommon hand pathologies, were constructed and reviewed by 4 fellowship-trained orthopedic hand surgeons and an orthopedic resident. ChatGPT was given these vignettes and asked to generate a differential diagnosis, potential workup plan, and provide treatment options for its top differential. Responses were graded for accuracy and the overall utility scored on a 5-point Likert scale.Results: The diagnostic accuracy of ChatGPT was 7 out of 9 cases, indicating an overall accuracy rate of 78%. ChatGPT was less reliable with more complex pathologies and failed to identify an intentionally incorrect presentation. ChatGPT received a score of 3.8 +/- 1.4 for correct diagnosis, 3.4 +/- 1.4 for helpfulness in guiding patient management, 4.1 +/- 1.0 for appropriate workup for the actual diagnosis, 4.3 +/- 0.8 for an appropriate recommended treatment plan for the diagnosis, and 4.4 +/- 0.8 for the helpfulness of treatment options in managing patients.Conclusion: ChatGPT was successful in diagnosing most of the conditions; however, the overall utility of its advice was variable. While it performed well in recommending treatments, it faced difficulties in providing appropriate diagnoses for uncommon pathologies. In addition, it failed to identify an obvious error in presenting pathology.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Evaluation of Prompts to Simplify Cardiovascular Disease Information Generated Using a Large Language Model: Cross-Sectional Study
    Mishra, Vishala
    Sarraju, Ashish
    Kalwani, Neil M.
    Dexter, Joseph P.
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
  • [2] <hr>Evaluating a large language model's ability to answer clinicians' requests for evidence summaries
    Blasingame, Mallory N.
    Koonce, Taneya Y.
    Williams, Annette M.
    Giuse, Dario A.
    Su, Jing
    Krump, Poppy A.
    Giuse, Nunzia Bettinsoli
    JOURNAL OF THE MEDICAL LIBRARY ASSOCIATION, 2025, 113 (01) : 65 - 77
  • [3] Large language model may assist diagnosis of SAPHO syndrome by bone scintigraphy
    Mori, Yu
    Izumiyama, Takuya
    Kanabuchi, Ryuichi
    Mori, Naoko
    Aizawa, Toshimi
    MODERN RHEUMATOLOGY, 2024, 34 (05) : 1043 - 1046
  • [4] Classifying Unstructured Text in Electronic Health Records for Mental Health Prediction Models: Large Language Model Evaluation Study
    Cardamone, Nicholas C.
    Olfson, Mark
    Schmutte, Timothy
    Ungar, Lyle
    Liu, Tony
    Cullen, Sara W.
    Williams, Nathaniel J.
    Marcus, Steven C.
    JMIR MEDICAL INFORMATICS, 2025, 13
  • [5] Prompt matters: evaluation of large language model chatbot responses related to Peyronie's disease
    Warren, Christopher J.
    Edmonds, Victoria S.
    Payne, Nicolette G.
    Voletti, Sandeep
    Wu, Sarah Y.
    Colquitt, Jennakay
    Sadeghi-Nejad, Hossein
    Punjani, Nahid
    SEXUAL MEDICINE, 2024, 12 (04)
  • [6] Assessing the Ability of a Large Language Model to Score Free-Text Medical Student Clinical Notes: Quantitative Study
    Burke, Harry B.
    Hoang, Albert
    Lopreiato, Joseph
    King, Heidi
    Hemmer, Paul
    Montgomery, Michael
    Gagarin, Viktoria
    JMIR MEDICAL EDUCATION, 2024, 10
  • [7] Investigating the Accuracy and Completeness of an Artificial Intelligence Large Language Model About Uveitis: An Evaluation of ChatGPT
    Marshall, Rayna F.
    Mallem, Krishna
    Xu, Hannah
    Thorne, Jennifer
    Burkholder, Bryn
    Chaon, Benjamin
    Liberman, Paulina
    Berkenstock, Meghan
    OCULAR IMMUNOLOGY AND INFLAMMATION, 2024, 32 (09) : 2052 - 2055
  • [8] Evaluating a large language model's ability to solve programming exercises from an introductory bioinformatics course
    Piccolo, Stephen R.
    Denny, Paul
    Luxton-Reilly, Andrew
    Payne, Samuel H.
    Ridge, Perry G.
    PLOS COMPUTATIONAL BIOLOGY, 2023, 19 (09)
  • [9] Capacity for large language model chatbots to aid in orthopedic management, research, and patient queries
    Sosa, Branden R.
    Cung, Michelle
    Suhardi, Vincentius J.
    Morse, Kyle
    Thomson, Andrew
    Yang, He S.
    Iyer, Sravisht
    Greenblatt, Matthew B.
    JOURNAL OF ORTHOPAEDIC RESEARCH, 2024, 42 (06) : 1276 - 1282
  • [10] Evaluation and Analysis of the Chinese Semantic Dependency Understanding Ability of Large Language Models
    Shen, Zizhuo
    Li, Wei
    Shao, Yanqiu
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 92 - 104