Harnessing Large Language Models for Structured Reporting in Breast Ultrasound: A Comparative Study of Open AI (GPT-4.0) and Microsoft Bing (GPT-4)

被引:1
作者
Liu, ChaoXu [1 ,2 ]
Wei, MinYan [1 ,2 ]
Qin, Yu [1 ,2 ]
Zhang, MeiXiang [1 ,2 ]
Jiang, Huan [1 ,2 ]
Xu, JiaLe [1 ,2 ]
Zhang, YuNing [1 ,2 ]
Hua, Qing [1 ,2 ]
Hou, YiQing [1 ,2 ]
Dong, YiJie [1 ,2 ]
Xia, ShuJun [1 ,2 ]
Li, Ning [3 ]
Zhou, JianQiao [1 ,2 ]
机构
[1] Shanghai Jiao Tong Univ, Ruijin Hosp, Dept Ultrasound, Sch Med, 197 Ruijin Er Rd, Shanghai 200025, Peoples R China
[2] Shanghai Jiao Tong Univ, Coll Hlth Sci & Technol, Sch Med, Shanghai, Peoples R China
[3] Dali Univ, Affiliated Hosp 7, Yunnan Kungang Hosp, Dept Ultrasound, Anning, Yunnan, Peoples R China
基金
中国国家自然科学基金;
关键词
Ultrasound; BIRADS; Large language models; GPT-4; Breast cancer; Reporting; Performance;
D O I
10.1016/j.ultrasmedbio.2024.07.007
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Objectives To assess the capabilities of large language models (LLMs), including Open AI (GPT-4.0) and Microsoft Bing (GPT-4), in generating structured reports, the Breast Imaging Reporting and Data System (BI-RADS) categories, and management recommendations from free-text breast ultrasound reports. Materials and Methods In this retrospective study, 100 free-text breast ultrasound reports from patients who underwent surgery between January and May 2023 were gathered. The capabilities of Open AI (GPT-4.0) and Microsoft Bing (GPT-4) to convert these unstructured reports into structured ultrasound reports were studied. The quality of structured reports, BI-RADS categories, and management recommendations generated by GPT-4.0 and Bing were evaluated by senior radiologists based on the guidelines. Results Open AI (GPT-4.0) was better than Microsoft Bing (GPT-4) in terms of performance in generating structured reports (88% vs. 55%; p < 0.001), giving correct BI-RADS categories (54% vs. 47%; p = 0.013) and providing reasonable management recommendations (81% vs. 63%; p < 0.001). As the ability to predict benign and malignant characteristics, GPT-4.0 performed significantly better than Bing (AUC, 0.9317 vs. 0.8177; p < 0.001), while both performed significantly inferior to senior radiologists (AUC, 0.9763; both p < 0.001). Conclusion This study highlights the potential of LLMs, specifically Open AI (GPT-4.0), in converting unstructured breast ultrasound reports into structured ones, offering accurate diagnoses and providing reasonable recommendations.
引用
收藏
页码:1697 / 1703
页数:7
相关论文
共 24 条
  • [21] BI-RADS® fifth edition: A summary of changes
    Spak, D. A.
    Plaxco, J. S.
    Santiago, L.
    Dryden, M. J.
    Dogan, B. E.
    [J]. DIAGNOSTIC AND INTERVENTIONAL IMAGING, 2017, 98 (03) : 179 - 190
  • [22] Accuracy of Information and References Using ChatGPT-3 for Retrieval of Clinical Radiological Information
    Wagner, Matthias W.
    Ertl-Wagner, Birgit B.
    [J]. CANADIAN ASSOCIATION OF RADIOLOGISTS JOURNAL-JOURNAL DE L ASSOCIATION CANADIENNE DES RADIOLOGISTES, 2024, 75 (01): : 69 - 73
  • [23] GPT-4: a new era of artificial intelligence in medicine
    Waisberg, Ethan
    Ong, Joshua
    Masalkhi, Mouayad
    Kamran, Sharif Amit
    Zaman, Nasif
    Sarker, Prithul
    Lee, Andrew G.
    Tavakkoli, Alireza
    [J]. IRISH JOURNAL OF MEDICAL SCIENCE, 2023, 192 (06) : 3197 - 3200
  • [24] Wei Jason, 2022, ADV NEURAL INFORM PR, V35, P24824, DOI [10.5555/3600270.3602070, DOI 10.48550/ARXIV.2201.11903]