From Revisions to Insights: Converting Radiology Report Revisions into Actionable Educational Feedback Using Generative AI Models

被引：5

作者：

Lyo, Shawn ^{[1
]}

Mohan, Suyash ^{[1
]}

Hassankhani, Alvand ^{[1
]}

Noor, Abass ^{[1
]}

Dako, Farouk ^{[1
]}

Cook, Tessa ^{[1
]}

机构：

[1] Hosp Univ Penn, Dept Radiol, Philadelphia, PA 19104 USA

来源：

JOURNAL OF IMAGING INFORMATICS IN MEDICINE | 2025年 / 38卷 / 02期

关键词：

Radiology training; Generative artificial intelligence; Large language models; Report revisions; Education; Precision radiology education;

D O I：

10.1007/s10278-024-01233-4

中图分类号：

R8 [特种医学]; R445 [影像诊断学];

学科分类号：

1002 ; 100207 ; 1009 ;

摘要：

Expert feedback on trainees' preliminary reports is crucial for radiologic training, but real-time feedback can be challenging due to non-contemporaneous, remote reading and increasing imaging volumes. Trainee report revisions contain valuable educational feedback, but synthesizing data from raw revisions is challenging. Generative AI models can potentially analyze these revisions and provide structured, actionable feedback. This study used the OpenAI GPT-4 Turbo API to analyze paired synthesized and open-source analogs of preliminary and finalized reports, identify discrepancies, categorize their severity and type, and suggest review topics. Expert radiologists reviewed the output by grading discrepancies, evaluating the severity and category accuracy, and suggested review topic relevance. The reproducibility of discrepancy detection and maximal discrepancy severity was also examined. The model exhibited high sensitivity, detecting significantly more discrepancies than radiologists (W = 19.0, p < 0.001) with a strong positive correlation (r = 0.778, p < 0.001). Interrater reliability for severity and type were fair (Fleiss' kappa = 0.346 and 0.340, respectively; weighted kappa = 0.622 for severity). The LLM achieved a weighted F1 score of 0.66 for severity and 0.64 for type. Generated teaching points were considered relevant in similar to 85% of cases, and relevance correlated with the maximal discrepancy severity (Spearman rho = 0.76, p < 0.001). The reproducibility was moderate to good (ICC (2,1) = 0.690) for the number of discrepancies and substantial for maximal discrepancy severity (Fleiss' kappa = 0.718; weighted kappa = 0.94). Generative AI models can effectively identify discrepancies in report revisions and generate relevant educational feedback, offering promise for enhancing radiology training.

引用

页码：1265 / 1279

页数：15

共 31 条

[1] Leveraging GPT-4 for Post Hoc Transformation of Free-text Radiology Reports into Structured Reporting: A Multilingual Feasibility Study [J].

Adams, Lisa C. ;

Truhn, Daniel ;

Busch, Felix ;

Kader, Avan ;

Niehues, Stefan M. ;

Makowski, Marcus R. ;

Bressem, Keno K. .

RADIOLOGY, 2023, 307 (04)

[2]

AnthropicAI A., 2023, ANTHR X WE FED CLAUD

[3] Chatbots and Large Language Models in Radiology: A Practical Primer for Clinical and Research Applications [J].

Bhayana, Rajesh .

RADIOLOGY, 2024, 310 (01)

[4]

Brown TB, 2020, ADV NEUR IN, V33

[5] Capricorn-A Web-Based Automatic Case Log and Volume Analytics for Diagnostic Radiology Residents [J].

Chen, Po-Hao ;

Chen, Yin Jie ;

Cook, Tessa S. .

ACADEMIC RADIOLOGY, 2015, 22 (10) :1242-1251

[6] Large language models in radiology: fundamentals, applications, ethical considerations, risks, and future directions [J].

D'Antonoli, Tugba Akinci ;

Stanzione, Arnaldo ;

Bluethgen, Christian ;

Vernuccio, Federica ;

Ugga, Lorenzo ;

Klontzas, Michail E. ;

Cuocolo, Renato ;

Cannella, Roberto ;

Kocak, Burak .

DIAGNOSTIC AND INTERVENTIONAL RADIOLOGY, 2024, 30 (02) :80-90

[7] The Radiology Readout: How Much Does It Matter? [J].

Dako, Farouk ;

Awan, Omer A. .

RADIOGRAPHICS, 2021, 41 (01) :316-317

[8] Artificial intelligence for precision education in radiology [J].

Duong, Michael Tran ;

Rauschecker, Andreas M. ;

Rudie, Jeffrey D. ;

Chen, Po-Hao ;

Cook, Tessa S. ;

Bryan, R. Nick ;

Mohan, Suyash .

BRITISH JOURNAL OF RADIOLOGY, 2019, 92 (1103)

[9] Radiology Resident Assessment and Feedback Dashboard [J].

Durojaiye, Ashimiyu B. ;

Snyder, Elizabeth ;

Cohen, Michael ;

Nagy, Paul ;

Hong, Kelvin ;

Johnson, Pamela T. .

RADIOGRAPHICS, 2018, 38 (05) :1443-1453

[10] Potential of ChatGPT and GPT-4 for Data Mining of Free-Text CT Reports on Lung Cancer [J].

Fink, Matthias A. ;

Bischoff, Arved ;

Fink, Christoph A. ;

Moll, Martin ;

Kroschke, Jonas ;

Dulz, Luca ;

Heussel, Claus Peter ;

Kauczor, Hans-Ulrich ;

Weber, Tim F. .

RADIOLOGY, 2023, 308 (03)

← 1 2 3 4 →