Classification of CT pulmonary angiography reports by presence, chronicity, and location of pulmonary embolism with natural language processing

被引:27
作者
Yu, Sheng [1 ,2 ]
Kumamaru, Kanako K. [2 ,3 ]
George, Elizabeth [2 ,3 ]
Dunne, Ruth M. [2 ,4 ]
Bedayat, Arash [2 ,3 ,5 ]
Neykov, Matey [6 ]
Hunsaker, Andetta R. [2 ,4 ]
Dill, Karin E. [7 ]
Cai, Tianxi [6 ]
Rybicki, Frank J. [2 ,3 ]
机构
[1] Brigham & Womens Hosp, Partners HealthCare Personalized Med, Boston, MA 02115 USA
[2] Harvard Univ, Sch Med, Boston, MA USA
[3] Brigham & Womens Hosp, Dept Radiol, Appl Imaging Sci Lab, Boston, MA 02115 USA
[4] Brigham & Womens Hosp, Dept Radiol, Boston, MA 02115 USA
[5] Univ Massachusetts, Sch Med, Dept Radiol, Worcester, MA USA
[6] Harvard Univ, Sch Publ Hlth, Dept Biostat, Boston, MA 02115 USA
[7] Univ Chicago, Dept Radiol, Chicago, IL 60637 USA
基金
美国国家卫生研究院;
关键词
Natural language processing; NILE; Nested modification structure; Pulmonary embolism; CT pulmonary angiography; VENTRICULAR DIAMETER RATIOS; VENOUS THROMBOEMBOLISM; ABBREVIATIONS; VALIDATION; EXTRACTION; RADIOLOGY; ALGORITHM; MORTALITY; PROGRESS; DEFECTS;
D O I
10.1016/j.jbi.2014.08.001
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper we describe an efficient tool based on natural language processing for classifying the detail state of pulmonary embolism (PE) recorded in CT pulmonary angiography reports. The classification tasks include: PE present vs. absent, acute PE vs. others, central PE vs. others, and subsegmental PE vs. others. Statistical learning algorithms were trained with features extracted using the NLP tool and gold standard labels obtained via chart review from two radiologists. The areas under the receiver operating characteristic curves (AUC) for the four tasks were 0.998, 0.945, 0.987, and 0.986, respectively. We compared our classifiers with bag-of-words Naive Bayes classifiers, a standard text mining technology, which gave AUC 0.942, 0.765, 0.766, and 0.712, respectively. (C) 2014 Elsevier Inc. All rights reserved.
引用
收藏
页码:386 / 393
页数:8
相关论文
共 45 条
  • [1] [Anonymous], 1993, INTRO BOOTSTRAP
  • [2] Berman JJ, 2004, ARCH PATHOL LAB MED, V128, P347
  • [3] Malignancy and Acute Pulmonary Embolism Risk Stratification Including the Right to Left Ventricle Diameter Ratio in 1596 Subjects
    Cai, Bryan
    Bedayat, Arash
    George, Elizabeth
    Hunsaker, Andetta R.
    Dill, Karin E.
    Rybicki, Frank J.
    Kumamaru, Kanako K.
    [J]. JOURNAL OF THORACIC IMAGING, 2013, 28 (03) : 196 - 201
  • [4] THE CLINICAL COURSE OF PULMONARY-EMBOLISM
    CARSON, JL
    KELLEY, MA
    DUFF, A
    WEG, JG
    FULKERSON, WJ
    PALEVSKY, HI
    SCHWARTZ, JS
    THOMPSON, BT
    POPOVICH, J
    HOBBINS, TE
    SPERA, MA
    ALAVI, A
    TERRIN, ML
    [J]. NEW ENGLAND JOURNAL OF MEDICINE, 1992, 326 (19) : 1240 - 1245
  • [5] Document-level classification of CT pulmonary angiography reports based on an extension of the ConText algorithm
    Chapman, Brian E.
    Lee, Sean
    Kang, Hyunseok Peter
    Chapman, Wendy W.
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2011, 44 (05) : 728 - 737
  • [6] Dligach D, 2013, J AM MED INF ASS
  • [7] Application of recently developed computer algorithm for automatic classification of unstructured radiology reports: Validation study
    Dreyer, KJ
    Kalra, MK
    Maher, MM
    Hurier, AM
    Asfaw, BA
    Schultz, T
    Halpern, EF
    Thrall, JH
    [J]. RADIOLOGY, 2005, 234 (02) : 323 - 329
  • [8] Improvements on cross-validation: The .632+ bootstrap method
    Efron, B
    Tibshirani, R
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1997, 92 (438) : 548 - 560
  • [9] A GENERAL NATURAL-LANGUAGE TEXT PROCESSOR FOR CLINICAL RADIOLOGY
    FRIEDMAN, C
    ALDERSON, PO
    AUSTIN, JHM
    CIMINO, JJ
    JOHNSON, SB
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1994, 1 (02) : 161 - 174
  • [10] Acute pulmonary embolism: clinical outcomes in the International Cooperative Pulmonary Embolism Registry (ICOPER)
    Goldhaber, SZ
    Visani, L
    De Rosa, M
    [J]. LANCET, 1999, 353 (9162) : 1386 - 1389