Harnessing Large Language Models for Structured Reporting in Breast Ultrasound: A Comparative Study of Open AI (GPT-4.0) and Microsoft Bing (GPT-4)

被引：5

作者：

Liu, ChaoXu ^{[1
,2
]}

Wei, MinYan ^{[1
,2
]}

Qin, Yu ^{[1
,2
]}

Zhang, MeiXiang ^{[1
,2
]}

Jiang, Huan ^{[1
,2
]}

Xu, JiaLe ^{[1
,2
]}

Zhang, YuNing ^{[1
,2
]}

Hua, Qing ^{[1
,2
]}

Hou, YiQing ^{[1
,2
]}

Dong, YiJie ^{[1
,2
]}

Xia, ShuJun ^{[1
,2
]}

Li, Ning ^{[3
]}

Zhou, JianQiao ^{[1
,2
]}

机构：

[1] Shanghai Jiao Tong Univ, Ruijin Hosp, Dept Ultrasound, Sch Med, 197 Ruijin Er Rd, Shanghai 200025, Peoples R China

[2] Shanghai Jiao Tong Univ, Coll Hlth Sci & Technol, Sch Med, Shanghai, Peoples R China

[3] Dali Univ, Affiliated Hosp 7, Yunnan Kungang Hosp, Dept Ultrasound, Anning, Yunnan, Peoples R China

来源：

ULTRASOUND IN MEDICINE AND BIOLOGY | 2024年 / 50卷 / 11期

基金：

中国国家自然科学基金;

关键词：

Ultrasound; BIRADS; Large language models; GPT-4; Breast cancer; Reporting; Performance;

D O I：

10.1016/j.ultrasmedbio.2024.07.007

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Objectives To assess the capabilities of large language models (LLMs), including Open AI (GPT-4.0) and Microsoft Bing (GPT-4), in generating structured reports, the Breast Imaging Reporting and Data System (BI-RADS) categories, and management recommendations from free-text breast ultrasound reports. Materials and Methods In this retrospective study, 100 free-text breast ultrasound reports from patients who underwent surgery between January and May 2023 were gathered. The capabilities of Open AI (GPT-4.0) and Microsoft Bing (GPT-4) to convert these unstructured reports into structured ultrasound reports were studied. The quality of structured reports, BI-RADS categories, and management recommendations generated by GPT-4.0 and Bing were evaluated by senior radiologists based on the guidelines. Results Open AI (GPT-4.0) was better than Microsoft Bing (GPT-4) in terms of performance in generating structured reports (88% vs. 55%; p < 0.001), giving correct BI-RADS categories (54% vs. 47%; p = 0.013) and providing reasonable management recommendations (81% vs. 63%; p < 0.001). As the ability to predict benign and malignant characteristics, GPT-4.0 performed significantly better than Bing (AUC, 0.9317 vs. 0.8177; p < 0.001), while both performed significantly inferior to senior radiologists (AUC, 0.9763; both p < 0.001). Conclusion This study highlights the potential of LLMs, specifically Open AI (GPT-4.0), in converting unstructured breast ultrasound reports into structured ones, offering accurate diagnoses and providing reasonable recommendations.

引用

页码：1697 / 1703

页数：7

共 24 条

[11] Workload of diagnostic radiologists in the foreseeable future based on recent scientific advances: growth expectations and role of artificial intelligence [J].

Kwee, Thomas C. ;

Kwee, Robert M. .

INSIGHTS INTO IMAGING, 2021, 12 (01)

[12] Improving Consistency in Radiology Reporting through the Use of Department-wide Standardized Structured Reporting [J].

Larson, David B. ;

Towbin, Alex J. ;

Pryor, Rebecca M. ;

Donnelly, Lane F. .

RADIOLOGY, 2013, 267 (01) :240-250

[13] Breast cancer [J].

Loibl, Sibylle ;

Poortmans, Philip ;

Morrow, Monica ;

Denkert, Carsten ;

Curigliano, Giuseppe .

LANCET, 2021, 397 (10286) :1750-1769

[14]

Marvin Ggaliwango, 2023, INT C DATA INTELLIGE, P387

[15] Problem-solving ultrasound [J].

Mendelson, EB .

RADIOLOGIC CLINICS OF NORTH AMERICA, 2004, 42 (05) :909-+

[16]

Rahsepar AA, 2023, RADIOLOGY, V307, DOI 10.1148/radiol.230922

[17] Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study [J].

Rao, Arya ;

Pang, Michael ;

Kim, John ;

Kamineni, Meghana ;

Lie, Winston ;

Prasad, Anoop K. ;

Landman, Adam ;

Dreyer, Keith ;

Succi, Marc D. .

JOURNAL OF MEDICAL INTERNET RESEARCH, 2023, 25

[18] A Context-based Chatbot Surpasses Radiologists and Generic ChatGPT in Following the ACR Appropriateness Guidelines [J].

Rau, Alexander ;

Rau, Stephan ;

Zoeller, Daniela ;

Fink, Anna ;

Tran, Hien ;

Wilpert, Caroline ;

Nattenmuller, Johanna ;

Neubauer, Jakob ;

Bamberg, Fabian ;

Reisert, Marco ;

Russe, Maximilian F. .

RADIOLOGY, 2023, 308 (01)

[19] ChatGPT and Other Large Language Models Are Double-edged Swords [J].

Shen, Yiqiu ;

Heacock, Laura ;

Elias, Jonathan ;

Hentel, Keith D. ;

Reig, Beatriu ;

Shih, George ;

Moy, Linda .

RADIOLOGY, 2023, 307 (02)

[20] Potential and Pitfalls of ChatGPT and Natural-Language Artificial Intelligence Models for Diabetes Education [J].

Sng, Gerald Gui Ren ;

Tung, Joshua Yi Min ;

Lim, Daniel Yan Zheng ;

Bee, Yong Mong .

DIABETES CARE, 2023, 46 (05) :E103-E105

← 1 2 3 →