Harnessing Large Language Models for Structured Reporting in Breast Ultrasound: A Comparative Study of Open AI (GPT-4.0) and Microsoft Bing (GPT-4)

被引:1
作者
Liu, ChaoXu [1 ,2 ]
Wei, MinYan [1 ,2 ]
Qin, Yu [1 ,2 ]
Zhang, MeiXiang [1 ,2 ]
Jiang, Huan [1 ,2 ]
Xu, JiaLe [1 ,2 ]
Zhang, YuNing [1 ,2 ]
Hua, Qing [1 ,2 ]
Hou, YiQing [1 ,2 ]
Dong, YiJie [1 ,2 ]
Xia, ShuJun [1 ,2 ]
Li, Ning [3 ]
Zhou, JianQiao [1 ,2 ]
机构
[1] Shanghai Jiao Tong Univ, Ruijin Hosp, Dept Ultrasound, Sch Med, 197 Ruijin Er Rd, Shanghai 200025, Peoples R China
[2] Shanghai Jiao Tong Univ, Coll Hlth Sci & Technol, Sch Med, Shanghai, Peoples R China
[3] Dali Univ, Affiliated Hosp 7, Yunnan Kungang Hosp, Dept Ultrasound, Anning, Yunnan, Peoples R China
基金
中国国家自然科学基金;
关键词
Ultrasound; BIRADS; Large language models; GPT-4; Breast cancer; Reporting; Performance;
D O I
10.1016/j.ultrasmedbio.2024.07.007
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Objectives To assess the capabilities of large language models (LLMs), including Open AI (GPT-4.0) and Microsoft Bing (GPT-4), in generating structured reports, the Breast Imaging Reporting and Data System (BI-RADS) categories, and management recommendations from free-text breast ultrasound reports. Materials and Methods In this retrospective study, 100 free-text breast ultrasound reports from patients who underwent surgery between January and May 2023 were gathered. The capabilities of Open AI (GPT-4.0) and Microsoft Bing (GPT-4) to convert these unstructured reports into structured ultrasound reports were studied. The quality of structured reports, BI-RADS categories, and management recommendations generated by GPT-4.0 and Bing were evaluated by senior radiologists based on the guidelines. Results Open AI (GPT-4.0) was better than Microsoft Bing (GPT-4) in terms of performance in generating structured reports (88% vs. 55%; p < 0.001), giving correct BI-RADS categories (54% vs. 47%; p = 0.013) and providing reasonable management recommendations (81% vs. 63%; p < 0.001). As the ability to predict benign and malignant characteristics, GPT-4.0 performed significantly better than Bing (AUC, 0.9317 vs. 0.8177; p < 0.001), while both performed significantly inferior to senior radiologists (AUC, 0.9763; both p < 0.001). Conclusion This study highlights the potential of LLMs, specifically Open AI (GPT-4.0), in converting unstructured breast ultrasound reports into structured ones, offering accurate diagnoses and providing reasonable recommendations.
引用
收藏
页码:1697 / 1703
页数:7
相关论文
共 24 条
  • [11] Workload of diagnostic radiologists in the foreseeable future based on recent scientific advances: growth expectations and role of artificial intelligence
    Kwee, Thomas C.
    Kwee, Robert M.
    [J]. INSIGHTS INTO IMAGING, 2021, 12 (01)
  • [12] Improving Consistency in Radiology Reporting through the Use of Department-wide Standardized Structured Reporting
    Larson, David B.
    Towbin, Alex J.
    Pryor, Rebecca M.
    Donnelly, Lane F.
    [J]. RADIOLOGY, 2013, 267 (01) : 240 - 250
  • [13] Breast cancer
    Loibl, Sibylle
    Poortmans, Philip
    Morrow, Monica
    Denkert, Carsten
    Curigliano, Giuseppe
    [J]. LANCET, 2021, 397 (10286) : 1750 - 1769
  • [14] Marvin G, 2023, INT C DAT INT COGN I, P387, DOI DOI 10.1007/978-981-99-7962-230
  • [15] Problem-solving ultrasound
    Mendelson, EB
    [J]. RADIOLOGIC CLINICS OF NORTH AMERICA, 2004, 42 (05) : 909 - +
  • [16] Rahsepar AA, 2023, RADIOLOGY, V307, DOI 10.1148/radiol.230922
  • [17] Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study
    Rao, Arya
    Pang, Michael
    Kim, John
    Kamineni, Meghana
    Lie, Winston
    Prasad, Anoop K.
    Landman, Adam
    Dreyer, Keith
    Succi, Marc D.
    [J]. JOURNAL OF MEDICAL INTERNET RESEARCH, 2023, 25
  • [18] A Context-based Chatbot Surpasses Radiologists and Generic ChatGPT in Following the ACR Appropriateness Guidelines
    Rau, Alexander
    Rau, Stephan
    Zoeller, Daniela
    Fink, Anna
    Tran, Hien
    Wilpert, Caroline
    Nattenmuller, Johanna
    Neubauer, Jakob
    Bamberg, Fabian
    Reisert, Marco
    Russe, Maximilian F.
    [J]. RADIOLOGY, 2023, 308 (01)
  • [19] ChatGPT and Other Large Language Models Are Double-edged Swords
    Shen, Yiqiu
    Heacock, Laura
    Elias, Jonathan
    Hentel, Keith D.
    Reig, Beatriu
    Shih, George
    Moy, Linda
    [J]. RADIOLOGY, 2023, 307 (02)
  • [20] Potential and Pitfalls of ChatGPT and Natural-Language Artificial Intelligence Models for Diabetes Education
    Sng, Gerald Gui Ren
    Tung, Joshua Yi Min
    Lim, Daniel Yan Zheng
    Bee, Yong Mong
    [J]. DIABETES CARE, 2023, 46 (05) : E103 - E105