Harnessing Large Language Models for Structured Reporting in Breast Ultrasound: A Comparative Study of Open AI (GPT-4.0) and Microsoft Bing (GPT-4)

被引：5

作者：

Liu, ChaoXu ^{[1
,2
]}

Wei, MinYan ^{[1
,2
]}

Qin, Yu ^{[1
,2
]}

Zhang, MeiXiang ^{[1
,2
]}

Jiang, Huan ^{[1
,2
]}

Xu, JiaLe ^{[1
,2
]}

Zhang, YuNing ^{[1
,2
]}

Hua, Qing ^{[1
,2
]}

Hou, YiQing ^{[1
,2
]}

Dong, YiJie ^{[1
,2
]}

Xia, ShuJun ^{[1
,2
]}

Li, Ning ^{[3
]}

Zhou, JianQiao ^{[1
,2
]}

机构：

[1] Shanghai Jiao Tong Univ, Ruijin Hosp, Dept Ultrasound, Sch Med, 197 Ruijin Er Rd, Shanghai 200025, Peoples R China

[2] Shanghai Jiao Tong Univ, Coll Hlth Sci & Technol, Sch Med, Shanghai, Peoples R China

[3] Dali Univ, Affiliated Hosp 7, Yunnan Kungang Hosp, Dept Ultrasound, Anning, Yunnan, Peoples R China

来源：

ULTRASOUND IN MEDICINE AND BIOLOGY | 2024年 / 50卷 / 11期

基金：

中国国家自然科学基金;

关键词：

Ultrasound; BIRADS; Large language models; GPT-4; Breast cancer; Reporting; Performance;

D O I：

10.1016/j.ultrasmedbio.2024.07.007

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Objectives To assess the capabilities of large language models (LLMs), including Open AI (GPT-4.0) and Microsoft Bing (GPT-4), in generating structured reports, the Breast Imaging Reporting and Data System (BI-RADS) categories, and management recommendations from free-text breast ultrasound reports. Materials and Methods In this retrospective study, 100 free-text breast ultrasound reports from patients who underwent surgery between January and May 2023 were gathered. The capabilities of Open AI (GPT-4.0) and Microsoft Bing (GPT-4) to convert these unstructured reports into structured ultrasound reports were studied. The quality of structured reports, BI-RADS categories, and management recommendations generated by GPT-4.0 and Bing were evaluated by senior radiologists based on the guidelines. Results Open AI (GPT-4.0) was better than Microsoft Bing (GPT-4) in terms of performance in generating structured reports (88% vs. 55%; p < 0.001), giving correct BI-RADS categories (54% vs. 47%; p = 0.013) and providing reasonable management recommendations (81% vs. 63%; p < 0.001). As the ability to predict benign and malignant characteristics, GPT-4.0 performed significantly better than Bing (AUC, 0.9317 vs. 0.8177; p < 0.001), while both performed significantly inferior to senior radiologists (AUC, 0.9763; both p < 0.001). Conclusion This study highlights the potential of LLMs, specifically Open AI (GPT-4.0), in converting unstructured breast ultrasound reports into structured ones, offering accurate diagnoses and providing reasonable recommendations.

引用

页码：1697 / 1703

页数：7

共 24 条

[1] Leveraging GPT-4 for Post Hoc Transformation of Free-text Radiology Reports into Structured Reporting: A Multilingual Feasibility Study [J].