Assessing Completeness of Clinical Histories Accompanying Imaging Orders Using Adapted Open-Source and Closed-Source Large Language Models

被引：0

作者：

Larson, David B. ^{[1
,2
]}

Koirala, Arogya ^{[2
]}

Cheuy, Lina Y. ^{[1
,2
]}

Paschali, Magdalini ^{[1
,2
]}

Van Veen, Dave ^{[3
]}

Na, Hye Sun ^{[1
,2
]}

Petterson, Matthew B. ^{[1
]}

Fang, Zhongnan ^{[1
,2
]}

Chaudhari, Akshay S. ^{[1
,2
,4
]}

机构：

[1] Stanford Univ, Dept Radiol, Sch Med, 453 Quarry Rd,MC 5659, Stanford, CA 94304 USA

[2] Stanford Univ, AI Dev & Evaluat Lab, Sch Med, Palo Alto, CA 94305 USA

[3] Stanford Univ, Dept Elect Engn, Stanford, CA USA

[4] Stanford Univ, Dept Biomed Data Sci, Stanford, CA USA

来源：

RADIOLOGY | 2025年 / 314卷 / 02期

关键词：

D O I：

10.1148/radiol.241051

中图分类号：

R8 [特种医学]; R445 [影像诊断学];

学科分类号：

1002 ; 100207 ; 1009 ;

摘要：

Background: Incomplete clinical histories are a well-known problem in radiology. Previous dedicated quality improvement efforts focusing on reproducible assessments of the completeness of free-text clinical histories have relied on tedious manual analysis. Purpose: To adapt and evaluate open-source and closed-source large language models (LLMs) for their ability to automatically extract clinical history elements within imaging orders and to use the best-performing adapted open-source model to assess the completeness of a large sample of clinical histories as a benchmark for clinical practice. Materials and Methods: This retrospective single-site study used previously extracted information accompanying CT, MRI, US, and radiography orders from August 2020 to May 2022 at an adult and pediatric emergency department of a 613-bed tertiary academic medical center. Two open-source (Llama 2-7B [Meta], Mistral-7B [Mistral AI]) and one closed-source (GPT-4 Turbo [OpenAI]) LLMs were adapted using prompt engineering, in-context learning, and fine-tuning (open-source only) to extract the elements "past medical history," "what," "when," "where," and "clinical concern" from clinical histories. Model performance, interreader agreement using Cohen kappa (none to slight, 0.01-0.20; fair, 0.21-0.40; moderate, 0.41-0.60; substantial, 0.61-0.80; almost perfect, 0.81-1.00), and semantic similarity between the models and the adjudicated manual annotations of two board-certified radiologists with 16 and 3 years of postfellowship experience, respectively, were assessed using accuracy, Cohen kappa, and BERTScore, an LLM metric that quantifies how well two pieces of text convey the same meaning; 95% CIs were also calculated. The best-performing open-source model was then used to assess completeness on a large dataset of unannotated clinical histories. Results: A total of 50 186 clinical histories were included (794 training, 150 validation, 300 initial testing, 48 942 real-world application). Of the two open-source models, Mistral-7B outperformed Llama 2-7B in assessing completeness and was further fine-tuned. Both Mistral-7B and GPT-4 Turbo showed substantial overall agreement with radiologists (mean kappa, 0.73 [95% CI: 0.67, 0.78] to 0.77 [95% CI: 0.71, 0.82]) and adjudicated annotations (mean BERTScore, 0.96 [95% CI: 0.96, 0.97] for both models; P = .38). Mistral-7B also rivaled GPT-4 Turbo in performance (weighted overall mean accuracy, 91% [95% CI: 89, 93] vs 92% [95% CI: 90, 94]; P = .31) despite being a smaller model. Using Mistral-7B, 26.2% (12 803 of 48 942) of unannotated clinical histories were found to contain all five elements. Conclusion: An easily deployable fine-tuned open-source LLM (Mistral-7B), rivaling GPT-4 Turbo in performance, could effectively extract clinical history elements with substantial agreement with radiologists and produce a benchmark for completeness of a large sample of clinical histories. The model and code will be fully open-sourced.

引用

页数：11

共 50 条

[21] Enhancing Code Security Through Open-Source Large Language Models: A Comparative Study
Ridley, Norah
Branca, Enrico
Kimber, Jadyn
Stakhanova, Natalia
FOUNDATIONS AND PRACTICE OF SECURITY, PT I, FPS 2023, 2024, 14551 : 233 - 249
[22] Evaluation of Open-Source Large Language Models for Metal-Organic Frameworks Research
Bai, Xuefeng
Xie, Yabo
Zhang, Xin
Han, Honggui
Li, Jian-Rong
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2024, 64 (13) : 4958 - 4965
[23] Fine-Tuning and Evaluating Open-Source Large Language Models for the Army Domain
Ruiz, Maj Daniel C.
Sell, John
arXiv,
[24] Iterative Refactoring of Real-World Open-Source Programs with Large Language Models
Choi, Jinsu
An, Gabin
Yoo, Shin
SEARCH-BASED SOFTWARE ENGINEERING, SSBSE 2024, 2024, 14767 : 49 - 55
[25] Comparing Commercial and Open-Source Large Language Models for Labeling Chest Radiograph Reports
Dorfner, Felix J.
Juergensen, Liv
Donle, Leonhard
Al Mohamad, Fares
Bodenmann, Tobias R.
Cleveland, Mason C.
Busch, Felix
Adams, Lisa C.
Sato, James
Schultz, Thomas
Kim, Albert E.
Merkow, Jameson
Bressem, Keno K.
Bridge, Christopher P.
RADIOLOGY, 2024, 313 (01)
[26] EAI-SIM: An Open-source Embodied AI Simulation Framework with Large Language Models
Liu, Guocai
Sun, Tao
Li, Weihua
Li, Xiaohui
Liu, Xin
Cui, Jinqiang
2024 IEEE 18TH INTERNATIONAL CONFERENCE ON CONTROL & AUTOMATION, ICCA 2024, 2024, : 994 - 999
[27] Benchmarking open-source large language models on Portuguese Revalida multiple-choice questions
Severino, Joao Victor Bruneti
de Paula, Pedro Angelo Basei
Berger, Matheus Nespolo
Loures, Filipe Silveira
Todeschini, Solano Amadori
Roeder, Eduardo Augusto
Veiga, Maria Han
Guedes, Murilo
Marques, Gustavo Lenci
BMJ HEALTH & CARE INFORMATICS, 2025, 32 (01)
[28] Analyzing Women's Contributions to Open-Source Software Projects based on Large Language Models
Zhuang, Yuqian
Zhang, Mingya
Yang, Yiyuan
Wang, Liang
PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 2363 - 2368
[29] Need of Fine-Tuned Radiology Aware Open-Source Large Language Models for Neuroradiology
Ray, Partha Pratim
CLINICAL NEURORADIOLOGY, 2024,
[30] Toponym resolution leveraging lightweight and open-source large language models and geo-knowledge
Hu, Xuke
Kersten, Jens
Klan, Friederike
Farzana, Sheikh Mastura
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE, 2024,

← 1 2 3 4 5 →