Potential of ChatGPT and GPT-4 for Data Mining of Free-Text CT Reports on Lung Cancer

被引:131
作者
Fink, Matthias A. [1 ,3 ,4 ]
Bischoff, Arved [1 ,3 ,4 ]
Fink, Christoph A. [2 ]
Moll, Martin [1 ]
Kroschke, Jonas [1 ]
Dulz, Luca [1 ,3 ,4 ]
Heussel, Claus Peter [1 ,3 ,4 ,5 ]
Kauczor, Hans-Ulrich [1 ,3 ,4 ]
Weber, Tim F. [1 ,3 ,4 ]
机构
[1] Univ Hosp Heidelberg, Clin Diagnost & Intervent Radiol, Neuenheimer Feld 420, D-69120 Heidelberg, Germany
[2] Univ Hosp Heidelberg, Dept Radiat Oncol, Neuenheimer Feld 420, D-69120 Heidelberg, Germany
[3] Translat Lung Res Ctr Heidelberg, Heidelberg, Germany
[4] German Ctr Lung Res, Heidelberg, Germany
[5] Heidelberg Univ, Dept Diagnost & Intervent Radiol Nucl Med, Heidelberg Thorac Clin, Heidelberg, Germany
关键词
D O I
10.1148/radiol.231362
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Background: The latest large language models (LLMs) solve unseen problems via user-defined text prompts without the need for retraining, offering potentially more efficient information extraction from free-text medical records than manual annotation. Purpose: To compare the performance of the LLMs ChatGPT and GPT-4 in data mining and labeling oncologic phenotypes from free-text CT reports on lung cancer by using user-defined prompts. Materials and Methods: This retrospective study included patients who underwent lung cancer follow-up CT between September 2021 and March 2023. A subset of 25 reports was reserved for prompt engineering to instruct the LLMs in extracting lesion diameters, labeling metastatic disease, and assessing oncologic progression. This output was fed into a rule-based natural language processing pipeline to match ground truth annotations from four radiologists and derive performance metrics. The oncologic reasoning of LLMs was rated on a five-point Likert scale for factual correctness and accuracy. The occurrence of confabulations was recorded. Statistical analyses included Wilcoxon signed rank and McNemar tests. Results: On 424 CT reports from 424 patients (mean age, 65 years +/- 11 [SD]; 265 male), GPT-4 outperformed ChatGPT in extracting lesion parameters (98.6% vs 84.0%, P <.001), resulting in 96% correctly mined reports (vs 67% for ChatGPT, P <.001). GPT-4 achieved higher accuracy in identification of metastatic disease (98.1% [95% CI: 97.7, 98.5] vs 90.3% [95% CI: 89.4, 91.0]) and higher performance in generating correct labels for oncologic progression (F1 score, 0.96 [95% CI: 0.94, 0.98] vs 0.91 [95% CI: 0.89, 0.94]) (both P <.001). In oncologic reasoning, GPT-4 had higher Likert scale scores for factual correctness (4.3 vs 3.9) and accuracy (4.4 vs 3.3), with a lower rate of confabulation (1.7% vs 13.7%) than ChatGPT (all P <.001). Conclusion: When using user-defined prompts, GPT-4 outperformed ChatGPT in extracting oncologic phenotypes from free-text CT reports on lung cancer and demonstrated better oncologic reasoning with fewer confabulations. (c) RSNA, 2023
引用
收藏
页数:9
相关论文
共 30 条
[1]  
2023, Arxiv, DOI [arXiv:2303.08774, DOI 10.48550/ARXIV.2303.08774]
[2]   Leveraging GPT-4 for Post Hoc Transformation of Free-text Radiology Reports into Structured Reporting: A Multilingual Feasibility Study [J].
Adams, Lisa C. ;
Truhn, Daniel ;
Busch, Felix ;
Kader, Avan ;
Niehues, Stefan M. ;
Makowski, Marcus R. ;
Bressem, Keno K. .
RADIOLOGY, 2023, 307 (04)
[3]   Using ChatGPT to write patient clinic letters [J].
Ali, Stephen R. ;
Dobbs, Thomas D. ;
Hutchings, Hayley A. ;
Whitaker, Iain S. .
LANCET, 2023, 5 (04) :E179-E181
[4]   GPT-4 in Radiology: Improvements in Advanced Reasoning [J].
Bhayana, Rajesh ;
Bleakney, Robert R. ;
Krishna, Satheesh .
RADIOLOGY, 2023, 307 (05)
[5]   Performance of ChatGPT on a Radiology Board-style Examination: Insights into Current Strengths and Limitations [J].
Bhayana, Rajesh ;
Krishna, Satheesh ;
Bleakney, Robert R. .
RADIOLOGY, 2023, 307 (05)
[6]  
Bommasani R., 2021, arXiv, DOI 10.48550/arXiv.2108.07258
[7]   Automated Detection of Measurements and Their Descriptors in Radiology Reports Using a Hybrid Natural Language Processing Algorithm [J].
Bozkurt, Selen ;
Alkim, Emel ;
Banerjee, Imon ;
Rubin, Daniel L. .
JOURNAL OF DIGITAL IMAGING, 2019, 32 (04) :544-553
[8]  
Brown TB, 2020, ADV NEUR IN, V33
[9]  
cancerresearchuk, 2015, Lung cancer incidence statistics
[10]  
ChatGPT, OpenAI