Practical Evaluation of ChatGPT Performance for Radiology Report Generation

被引:2
作者
Soleimani, Mohsen [1 ]
Seyyedi, Navisa [1 ]
Ayyoubzadeh, Seyed Mohammad [1 ,2 ]
Kalhori, Sharareh Rostam Niakan [1 ,3 ,4 ]
Keshavarz, Hamidreza [5 ]
机构
[1] Univ Tehran Med Sci, Sch Allied Med Sci, Dept Hlth Informat Management & Med Informat, Tehran, Iran
[2] Univ Tehran Med Sci, Hlth Informat Management Res Ctr, Tehran, Iran
[3] TU Braunschweig, Peter L Reichertz Inst Med Informat, Braunschweig, Germany
[4] Hannover Med Sch, Braunschweig, Germany
[5] Tarbiat Modares Univ, Fac Elect & Comp Engn, Tehran, Iran
关键词
Radiology report generation; Large Language Model; ChatGPT; AI-assisted radiology; NLP-based evaluation; CLASSIFICATION;
D O I
10.1016/j.acra.2024.07.020
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Rationale and Objectives: The process of generating radiology reports is often time-consuming and labor-intensive, prone to incompleteness, heterogeneity, and errors. By employing natural language processing (NLP)-based techniques, this study explores the potential for enhancing the efficiency of radiology report generation through the remarkable capabilities of ChatGPT (Generative Pre- training Transformer), a prominent large language model (LLM). Materials and Methods: Using a sample of 1000 records from the Medical Information Mart for Intensive Care (MIMIC) Chest X-ray Database, this investigation employed Claude.ai to extract initial radiological report keywords. ChatGPT then generated radiology reports using a consistent 3-step prompt template outline. Various lexical and sentence similarity techniques were employed to evaluate the correspondence between the AI assistant-generated reports and reference reports authored by medical professionals. Results: Results showed varying performance among NLP models, with Bart (Bidirectional and Auto-Regressive Transformers) and XLM (Cross-lingual Language Model) displaying high proficiency (mean similarity scores up to 99.3%), closely mirroring physician reports. Conversely, DeBERTa (Decoding-enhanced BERT with disentangled attention) and sequence-matching models scored lower, indicating less alignment with medical language. In the Impression section, the Word-Embedding model excelled with a mean similarity of 84.4%, while others like the Jaccard index showed lower performance. Conclusion: Overall, the study highlights significant variations across NLP models in their ability to generate radiology reports consistent with medical professionals' language. Pairwise comparisons and Kruskal-Wallis tests confirmed these differences, emphasizing the need for careful selection and evaluation of NLP models in radiology report generation. This research underscores the potential of ChatGPT to streamline and improve the radiology reporting process, with implications for enhancing efficiency and accuracy in clinical practice.
引用
收藏
页码:4823 / 4832
页数:10
相关论文
共 47 条
[1]   Microwave-assisted pyrolysis for waste plastic recycling: a review on critical parameters, benefits, challenges, and scalability perspectives [J].
Alam, S. S. ;
Khan, A. H. .
INTERNATIONAL JOURNAL OF ENVIRONMENTAL SCIENCE AND TECHNOLOGY, 2024, 21 (05) :5311-5330
[2]  
Alfianto Meizan Arthur, 2023, 2023 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT), P220, DOI 10.1109/IAICT59002.2023.10205769
[3]  
[Anonymous], SENTENCE TRANSFORMER
[4]   Could ChatGPT Pass the UK Radiology Fellowship Examinations? [J].
Ariyaratne, Sisith ;
Jenko, Nathan ;
Davies, A. Mark ;
Iyengar, Karthikeyan P. ;
Botchu, Rajesh .
ACADEMIC RADIOLOGY, 2024, 31 (05) :2178-2182
[5]  
Babic K, 2019, CEN EUR CON INFO INT, P27
[6]   Appropriateness of Recommendations Provided by ChatGPT to Interventional Radiologists [J].
Barat, Maxime ;
Soyer, Philippe ;
Dohan, Anthony .
CANADIAN ASSOCIATION OF RADIOLOGISTS JOURNAL-JOURNAL DE L ASSOCIATION CANADIENNE DES RADIOLOGISTES, 2023, 74 (04) :758-763
[7]  
Brown TB, 2020, ADV NEUR IN, V33
[8]   Evaluating the Feasibility of ChatGPT in Healthcare: An Analysis of Multiple Clinical and Research Scenarios [J].
Cascella, Marco ;
Montomoli, Jonathan ;
Bellini, Valentina ;
Bignami, Elena .
JOURNAL OF MEDICAL SYSTEMS, 2023, 47 (01)
[9]   A systematic review of natural language processing applied to radiology reports [J].
Casey, Arlene ;
Davidson, Emma ;
Poon, Michael ;
Dong, Hang ;
Duma, Daniel ;
Grivas, Andreas ;
Grover, Claire ;
Suarez-Paniagua, Victor ;
Tobin, Richard ;
Whiteley, William ;
Wu, Honghan ;
Alex, Beatrice .
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2021, 21 (01)
[10]   Evolution of Semantic Similarity-A Survey [J].
Chandrasekaran, Dhivya ;
Mago, Vijay .
ACM COMPUTING SURVEYS, 2021, 54 (02)