Multi-label annotation of text reports from computed tomography of the chest, abdomen, and pelvis using deep learning

被引:5
作者
D'Anniballe, Vincent M. [1 ]
Tushar, Fakrul Islam [1 ,2 ,3 ]
Faryna, Khrystyna [3 ]
Han, Songyue [4 ]
Mazurowski, Maciej A. [1 ,2 ]
Rubin, Geoffrey D. [1 ,5 ]
Lo, Joseph Y. [1 ,2 ]
机构
[1] Duke Univ, Ctr Virtual Imaging Trials, Dept Radiol, Carl E Ravin Adv Imaging Labs,Sch Med, 2424 Erwin Rd Ste 302, Durham, NC 27705 USA
[2] Duke Univ, Pratt Sch Engn, Dept Elect & Comp Engn, Durham, NC 27705 USA
[3] Univ Girona, Erasmus Joint Master Med Imaging & Applicat, Girona, Spain
[4] South China Univ Technol, Sch Software Engn, Guangzhou, Guangdong, Peoples R China
[5] Univ Arizona, Dept Med Imaging, Coll Med, Tucson, AZ USA
关键词
Weak supervision; Report labeling; Attention RNN; Rule-based algorithm; Natural language processing; Computed tomography; AUTOMATIC CLASSIFICATION; RADIOLOGY REPORTS;
D O I
10.1186/s12911-022-01843-4
中图分类号
R-058 [];
学科分类号
摘要
Background There is progress to be made in building artificially intelligent systems to detect abnormalities that are not only accurate but can handle the true breadth of findings that radiologists encounter in body (chest, abdomen, and pelvis) computed tomography (CT). Currently, the major bottleneck for developing multi-disease classifiers is a lack of manually annotated data. The purpose of this work was to develop high throughput multi-label annotators for body CT reports that can be applied across a variety of abnormalities, organs, and disease states thereby mitigating the need for human annotation. Methods We used a dictionary approach to develop rule-based algorithms (RBA) for extraction of disease labels from radiology text reports. We targeted three organ systems (lungs/pleura, liver/gallbladder, kidneys/ureters) with four diseases per system based on their prevalence in our dataset. To expand the algorithms beyond pre-defined keywords, attention-guided recurrent neural networks (RNN) were trained using the RBA-extracted labels to classify reports as being positive for one or more diseases or normal for each organ system. Alternative effects on disease classification performance were evaluated using random initialization or pre-trained embedding as well as different sizes of training datasets. The RBA was tested on a subset of 2158 manually labeled reports and performance was reported as accuracy and F-score. The RNN was tested against a test set of 48,758 reports labeled by RBA and performance was reported as area under the receiver operating characteristic curve (AUC), with 95% CIs calculated using the DeLong method. Results Manual validation of the RBA confirmed 91-99% accuracy across the 15 different labels. Our models extracted disease labels from 261,229 radiology reports of 112,501 unique subjects. Pre-trained models outperformed random initialization across all diseases. As the training dataset size was reduced, performance was robust except for a few diseases with a relatively small number of cases. Pre-trained classification AUCs reached > 0.95 for all four disease outcomes and normality across all three organ systems. Conclusions Our label-extracting pipeline was able to encompass a variety of cases and diseases in body CT reports by generalizing beyond strict rules with exceptional accuracy. The method described can be easily adapted to enable automated labeling of hospital-scale medical data sets for training image-based disease classifiers.
引用
收藏
页数:12
相关论文
共 36 条
  • [1] Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
  • [2] Comparative effectiveness of convolutional neural network (CNN) and recurrent neural network (RNN) architectures for radiology text report classification
    Banerjee, Imon
    Ling, Yuan
    Chen, Matthew C.
    Hasan, Sadid A.
    Langlotz, Curtis P.
    Moradzadeh, Nathaniel
    Chapman, Brian
    Amrhein, Timothy
    Mong, David
    Rubin, Daniel L.
    Farri, Oladimeji
    Lungren, Matthew P.
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2019, 97 : 79 - 88
  • [3] Brady Adrian, 2012, Ulster Med J, V81, P3
  • [4] Using recurrent neural network models for early detection of heart failure onset
    Choi, Edward
    Schuetz, Andy
    Stewart, Walter F.
    Sun, Jimeng
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2017, 24 (02) : 361 - 370
  • [5] COMPARING THE AREAS UNDER 2 OR MORE CORRELATED RECEIVER OPERATING CHARACTERISTIC CURVES - A NONPARAMETRIC APPROACH
    DELONG, ER
    DELONG, DM
    CLARKEPEARSON, DI
    [J]. BIOMETRICS, 1988, 44 (03) : 837 - 845
  • [6] Deng L., 2018, DEEP LEARNING NATURA, DOI DOI 10.1007/978-981-10-5209-5
  • [7] Machine-learning-based multiple abnormality prediction with large-scale chest computed tomography volumes
    Draelos, Rachel Lea
    Dov, David
    Mazurowski, Maciej A.
    Lo, Joseph Y.
    Henao, Ricardo
    Rubin, Geoffrey D.
    Carin, Lawrence
    [J]. MEDICAL IMAGE ANALYSIS, 2021, 67
  • [8] Application of recently developed computer algorithm for automatic classification of unstructured radiology reports: Validation study
    Dreyer, KJ
    Kalra, MK
    Maher, MM
    Hurier, AM
    Asfaw, BA
    Schultz, T
    Halpern, EF
    Thrall, JH
    [J]. RADIOLOGY, 2005, 234 (02) : 323 - 329
  • [9] Faryna K, P SPIE, V11314
  • [10] Structured Reporting in Radiology
    Ganeshan, Dhakshinamoorthy
    Phuong-Anh Thi Duong
    Probyn, Linda
    Lenchik, Leon
    McArthur, Tatum A.
    Retrouvey, Michele
    Ghobadi, Emily H.
    Desouches, Stephane L.
    Pastel, David
    Francis, Isaac R.
    [J]. ACADEMIC RADIOLOGY, 2018, 25 (01) : 66 - 73