Development and validation of open-source deep neural networks for comprehensive chest x-ray reading: a retrospective, multicentre study

被引：12

作者：

Cid, Yashin Dicente ^{[1
]}

Macpherson, Matthew ^{[1
,2
]}

Gervais-Andre, Louise ^{[5
]}

Zhu, Yuanyi ^{[1
,2
]}

Franco, Giuseppe ^{[1
]}

Santeramo, Ruggiero ^{[1
]}

Lim, Chee ^{[6
]}

Selby, Ian ^{[7
]}

Muthuswamy, Keerthini ^{[8
]}

Amlani, Ashik ^{[8
]}

Hopewell, Heath ^{[9
]}

Indrajeet, Das ^{[9
]}

Liakata, Maria ^{[10
,12
]}

Hutchinson, Charles E. ^{[3
,11
]}

Goh, Vicky ^{[5
,8
]}

Montana, Giovanni ^{[1
,4
,10
]}

机构：

[1] Univ Warwick, WMG, Coventry, England

[2] Univ Warwick, Math Inst, Coventry, England

[3] Univ Warwick, Warwick Med Sch, Coventry, England

[4] Univ Warwick, Dept Stat, Coventry CV4 7AL, England

[5] Kings Coll London, Sch Biomed Engn & Imaging Sci, London, England

[6] Univ Hosp Birmingham NHS Fdn Trust, Dept Radiol, Birmingham, England

[7] Univ Cambridge, Dept Radiol, Cambridge, England

[8] Guys & St Thomas NHS Fdn Trust, Dept Radiol, London, England

[9] Univ Hosp Leicester NHS Trust, Dept Radiol, Leicester, England

[10] Alan Turing Inst, London, England

[11] Univ Hosp Coventry & Warwickshire NHS Trust, Dept Radiol, Coventry, England

[12] Queen Mary Univ London, Inst Appl Data Sci, London, England

来源：

LANCET DIGITAL HEALTH | 2024年 / 6卷 / 01期

基金：

英国惠康基金;

关键词：

Classification (of information) - Clinical research - Deep neural networks - Large dataset - Natural language processing systems - Open systems;

D O I：

10.1016/S2589-7500(23)00218-2

中图分类号：

R-058 [];

学科分类号：

摘要：

Background Artificial intelligence (AI) systems for automated chest x-ray interpretation hold promise for standardising reporting and reducing delays in health systems with shortages of trained radiologists. Yet, there are few freely accessible AI systems trained on large datasets for practitioners to use with their own data with a view to accelerating clinical deployment of AI systems in radiology. We aimed to contribute an AI system for comprehensive chest x-ray abnormality detection.Methods In this retrospective cohort study, we developed open-source neural networks, X-Raydar and X-Raydar-NLP, for classifying common chest x-ray findings from images and their free-text reports. Our networks were developed using data from six UK hospitals from three National Health Service (NHS) Trusts (University Hospitals Coventry and Warwickshire NHS Trust, University Hospitals Birmingham NHS Foundation Trust, and University Hospitals Leicester NHS Trust) collectively contributing 2 513 546 chest x-ray studies taken from a 13-year period (2006-19), which yielded 1 940 508 usable free-text radiological reports written by the contemporary assessing radiologist (collectively referred to as the "historic reporters") and 1 896 034 frontal images. Chest x-rays were labelled using a taxonomy of 37 findings by a custom-trained natural language processing (NLP) algorithm, X-Raydar-NLP, from the original free-text reports. X-Raydar-NLP was trained on 23 230 manually annotated reports and tested on 4551 reports from all hospitals. 1 694 921 labelled images from the training set and 89 238 from the validation set were then used to train a multi-label image classifier. Our algorithms were evaluated on three retrospective datasets: a set of exams sampled randomly from the full NHS dataset reported during clinical practice and annotated using NLP (n=103 328); a consensus set sampled from all six hospitals annotated by three expert radiologists (two independent annotators for each image and a third consultant to facilitate disagreement resolution) under research conditions (n=1427); and an independent dataset, MIMIC-CXR, consisting of NLP-annotated exams (n=252 374).Findings X-Raydar achieved a mean AUC of 0 center dot 919 (SD 0 center dot 039) on the auto-labelled set, 0 center dot 864 (0 center dot 102) on the consensus set, and 0 center dot 842 (0 center dot 074) on the MIMIC-CXR test, demonstrating similar performance to the historic clinical radiologist reporters, as assessed on the consensus set, for multiple clinically important findings, including pneumothorax, parenchymal opacification, and parenchymal mass or nodules. On the consensus set, X-Raydar outperformed historical reporter balanced accuracy with significance on 27 of 37 findings, was non-inferior on nine, and inferior on one finding, resulting in an average improvement of 13 center dot 3% (SD 13 center dot 1) to 0.763 (0 center dot 110), including a mean 5.6% (13 center dot 2) improvement in critical findings to 0.826 (0 center dot 119). Interpretation Our study shows that automated classification of chest x-rays under a comprehensive taxonomy can achieve performance levels similar to those of historical reporters and exhibit robust generalisation to external data. The open-sourced neural networks can serve as foundation models for further research and are freely available to the research community.Funding Wellcome Trust. Copyright (c) 2023 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY-NC-ND 4.0 license.

引用

页码：e44 / e57

页数：14

共 28 条

[1] RETRACTED: COVID-CheXNet: hybrid deep learning framework for identifying COVID-19 virus in chest X-rays images (Retracted Article) [J].