Harnessing Large Language Models for Structured Reporting in Breast Ultrasound: A Comparative Study of Open AI (GPT-4.0) and Microsoft Bing (GPT-4)

被引:5
作者
Liu, ChaoXu [1 ,2 ]
Wei, MinYan [1 ,2 ]
Qin, Yu [1 ,2 ]
Zhang, MeiXiang [1 ,2 ]
Jiang, Huan [1 ,2 ]
Xu, JiaLe [1 ,2 ]
Zhang, YuNing [1 ,2 ]
Hua, Qing [1 ,2 ]
Hou, YiQing [1 ,2 ]
Dong, YiJie [1 ,2 ]
Xia, ShuJun [1 ,2 ]
Li, Ning [3 ]
Zhou, JianQiao [1 ,2 ]
机构
[1] Shanghai Jiao Tong Univ, Ruijin Hosp, Dept Ultrasound, Sch Med, 197 Ruijin Er Rd, Shanghai 200025, Peoples R China
[2] Shanghai Jiao Tong Univ, Coll Hlth Sci & Technol, Sch Med, Shanghai, Peoples R China
[3] Dali Univ, Affiliated Hosp 7, Yunnan Kungang Hosp, Dept Ultrasound, Anning, Yunnan, Peoples R China
基金
中国国家自然科学基金;
关键词
Ultrasound; BIRADS; Large language models; GPT-4; Breast cancer; Reporting; Performance;
D O I
10.1016/j.ultrasmedbio.2024.07.007
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Objectives To assess the capabilities of large language models (LLMs), including Open AI (GPT-4.0) and Microsoft Bing (GPT-4), in generating structured reports, the Breast Imaging Reporting and Data System (BI-RADS) categories, and management recommendations from free-text breast ultrasound reports. Materials and Methods In this retrospective study, 100 free-text breast ultrasound reports from patients who underwent surgery between January and May 2023 were gathered. The capabilities of Open AI (GPT-4.0) and Microsoft Bing (GPT-4) to convert these unstructured reports into structured ultrasound reports were studied. The quality of structured reports, BI-RADS categories, and management recommendations generated by GPT-4.0 and Bing were evaluated by senior radiologists based on the guidelines. Results Open AI (GPT-4.0) was better than Microsoft Bing (GPT-4) in terms of performance in generating structured reports (88% vs. 55%; p < 0.001), giving correct BI-RADS categories (54% vs. 47%; p = 0.013) and providing reasonable management recommendations (81% vs. 63%; p < 0.001). As the ability to predict benign and malignant characteristics, GPT-4.0 performed significantly better than Bing (AUC, 0.9317 vs. 0.8177; p < 0.001), while both performed significantly inferior to senior radiologists (AUC, 0.9763; both p < 0.001). Conclusion This study highlights the potential of LLMs, specifically Open AI (GPT-4.0), in converting unstructured breast ultrasound reports into structured ones, offering accurate diagnoses and providing reasonable recommendations.
引用
收藏
页码:1697 / 1703
页数:7
相关论文
共 24 条
[11]   Workload of diagnostic radiologists in the foreseeable future based on recent scientific advances: growth expectations and role of artificial intelligence [J].
Kwee, Thomas C. ;
Kwee, Robert M. .
INSIGHTS INTO IMAGING, 2021, 12 (01)
[12]   Improving Consistency in Radiology Reporting through the Use of Department-wide Standardized Structured Reporting [J].
Larson, David B. ;
Towbin, Alex J. ;
Pryor, Rebecca M. ;
Donnelly, Lane F. .
RADIOLOGY, 2013, 267 (01) :240-250
[13]   Breast cancer [J].
Loibl, Sibylle ;
Poortmans, Philip ;
Morrow, Monica ;
Denkert, Carsten ;
Curigliano, Giuseppe .
LANCET, 2021, 397 (10286) :1750-1769
[14]  
Marvin Ggaliwango, 2023, INT C DATA INTELLIGE, P387
[15]   Problem-solving ultrasound [J].
Mendelson, EB .
RADIOLOGIC CLINICS OF NORTH AMERICA, 2004, 42 (05) :909-+
[16]  
Rahsepar AA, 2023, RADIOLOGY, V307, DOI 10.1148/radiol.230922
[17]   Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study [J].
Rao, Arya ;
Pang, Michael ;
Kim, John ;
Kamineni, Meghana ;
Lie, Winston ;
Prasad, Anoop K. ;
Landman, Adam ;
Dreyer, Keith ;
Succi, Marc D. .
JOURNAL OF MEDICAL INTERNET RESEARCH, 2023, 25
[18]   A Context-based Chatbot Surpasses Radiologists and Generic ChatGPT in Following the ACR Appropriateness Guidelines [J].
Rau, Alexander ;
Rau, Stephan ;
Zoeller, Daniela ;
Fink, Anna ;
Tran, Hien ;
Wilpert, Caroline ;
Nattenmuller, Johanna ;
Neubauer, Jakob ;
Bamberg, Fabian ;
Reisert, Marco ;
Russe, Maximilian F. .
RADIOLOGY, 2023, 308 (01)
[19]   ChatGPT and Other Large Language Models Are Double-edged Swords [J].
Shen, Yiqiu ;
Heacock, Laura ;
Elias, Jonathan ;
Hentel, Keith D. ;
Reig, Beatriu ;
Shih, George ;
Moy, Linda .
RADIOLOGY, 2023, 307 (02)
[20]   Potential and Pitfalls of ChatGPT and Natural-Language Artificial Intelligence Models for Diabetes Education [J].
Sng, Gerald Gui Ren ;
Tung, Joshua Yi Min ;
Lim, Daniel Yan Zheng ;
Bee, Yong Mong .
DIABETES CARE, 2023, 46 (05) :E103-E105