A novel method of adverse event detection can accurately identify venous thromboembolisms (VTEs) from narrative electronic health record data

被引:52
作者
Rochefort, Christian M. [1 ,2 ,3 ]
Verma, Aman D. [2 ,3 ]
Eguale, Tewodros [2 ,4 ]
Lee, Todd C. [5 ]
Buckeridge, David L. [2 ,3 ]
机构
[1] McGill Univ, Fac Med, Ingram Sch Nursing, Montreal, PQ H3A 1A3, Canada
[2] McGill Univ, McGill Clin & Hlth Informat Res Grp, Montreal, PQ H3A 1A3, Canada
[3] McGill Univ, Dept Epidemiol Biostat & Occupat Hlth, Fac Med, Montreal, PQ H3A 1A3, Canada
[4] Brigham & Womens Hosp, Boston, MA 02115 USA
[5] MUHC, Montreal, PQ, Canada
基金
加拿大健康研究院;
关键词
support vector machines; automated text classification; deep vein thrombosis; pulmonary embolism; acute care hospital; natural language processing; SUPPORT VECTOR MACHINE; PULMONARY-HYPERTENSION; IDENTIFICATION; CARE; SURVEILLANCE; PREVENTION; DISEASES;
D O I
10.1136/amiajnl-2014-002768
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Background Venous thromboembolisms (VTEs), which include deep vein thrombosis (DVT) and pulmonary embolism (PE), are associated with significant mortality, morbidity, and cost in hospitalized patients. To evaluate the success of preventive measures, accurate and efficient methods for monitoring VTE rates are needed. Therefore, we sought to determine the accuracy of statistical natural language processing (NLP) for identifying DVT and PE from electronic health record data. Methods We randomly sampled 2000 narrative radiology reports from patients with a suspected DVT/PE in Montreal (Canada) between 2008 and 2012. We manually identified DVT/PE within each report, which served as our reference standard. Using a bag-of-words approach, we trained 10 alternative support vector machine (SVM) models predicting DVT, and 10 predicting PE. SVM training and testing was performed with nested 10-fold cross-validation, and the average accuracy of each model was measured and compared. Results On manual review, 324 (16.2%) reports were DVT-positive and 154 (7.7%) were PE-positive. The best DVT model achieved an average sensitivity of 0.80 (95% CI 0.76 to 0.85), specificity of 0.98 (98% CI 0.97 to 0.99), positive predictive value (PPV) of 0.89 (95% CI 0.85 to 0.93), and an area under the curve (AUC) of 0.98 (95% CI 0.97 to 0.99). The best PE model achieved sensitivity of 0.79 (95% CI 0.73 to 0.85), specificity of 0.99 (95% CI 0.98 to 0.99), PPV of 0.84 (95% CI 0.75 to 0.92), and AUC of 0.99 (95% CI 0.98 to 1.00). Conclusions Statistical NLP can accurately identify VTE from narrative radiology reports.
引用
收藏
页码:155 / 165
页数:11
相关论文
共 50 条
  • [1] [Anonymous], 1995, Natural language understanding
  • [2] [Anonymous], 2010, Technical Report
  • [3] [Anonymous], 2014, Evaluating Learning Algorithms A Classification Perspective, DOI DOI 10.1017/CBO9780511921803
  • [4] [Anonymous], 1997, MACHINE LEARNING, MCGRAW-HILL SCIENCE/ENGINEERING/MATH
  • [5] Do the AHRQ patient safety indicators flag conditions that are present at the time of hospital admission?
    Bahl, Vinita
    Thompson, Maureen A.
    Kau, Tsui-Ying
    Hu, Hsou Mei
    Campbell, Darrell A., Jr.
    [J]. MEDICAL CARE, 2008, 46 (05) : 516 - 522
  • [6] Detecting adverse events using information technology
    Bates, DW
    Evans, RS
    Murff, H
    Stetson, PD
    Pizziferri, L
    Hripcsak, G
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2003, 10 (02) : 115 - 128
  • [7] Pneumonia identification using statistical feature selection
    Bejan, Cosmin Adrian
    Xia, Fei
    Vanderwende, Lucy
    Wurfel, Mark M.
    Yetisgen-Yildiz, Meliha
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2012, 19 (05) : 817 - 823
  • [8] Chapman WW., 2006, Handbook of Biosurveillance, P255
  • [9] Classifying disease outbreak reports using n-grams and semantic features
    Conway, Mike
    Doan, Son
    Kawazoe, Ai
    Collier, Nigel
    [J]. INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2009, 78 (12) : E47 - E58
  • [10] SUPPORT-VECTOR NETWORKS
    CORTES, C
    VAPNIK, V
    [J]. MACHINE LEARNING, 1995, 20 (03) : 273 - 297