Automatic prediction of coronary artery disease from clinical narratives

被引:41
作者
Buchan, Kevin [1 ]
Filannino, Michele [2 ]
Uzuner, Ozlem [2 ]
机构
[1] SUNY Albany, Dept Informat Sci, Albany, NY 12222 USA
[2] SUNY Albany, Dept Comp Sci, Albany, NY 12222 USA
基金
美国国家卫生研究院;
关键词
Coronary artery disease; Prediction; Natural language processing; Machine learning; Dimensionality reduction; Ontology; RISK;
D O I
10.1016/j.jbi.2017.06.019
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Coronary Artery Disease (CAD) is not only the most common form of heart disease, but also the leading cause of death in both men and women (Coronary Artery Disease: MedlinePlus, 2015). We present a system that is able to automatically predict whether patients develop coronary artery disease based on their narrative medical histories, i.e., clinical free text. Although the free text in medical records has been used in several studies for identifying risk factors of coronary artery disease, to the best of our knowledge our work marks the first attempt at automatically predicting development of CAD. We tackle this task on a small corpus of diabetic patients. The size of this corpus makes it important to limit the number of features in order to avoid overfitting. We propose an ontology-guided approach to feature extraction, and compare it with two classic feature selection techniques. Our system achieves state-of-the-art performance of 77.4% F1 score. (C) 2017 The Authors. Published by Elsevier Inc.
引用
收藏
页码:23 / 32
页数:10
相关论文
共 47 条
[1]  
[Anonymous], 2009, MET
[2]  
[Anonymous], 2015, CORONARY ARTERY DIS
[3]  
[Anonymous], 2016, CTAKES 3 0 AP CTAKES
[4]  
[Anonymous], 2016, TREATM COR ART DIS M
[5]  
[Anonymous], 2016, NLTK UT NLTK 3 0 DOC
[6]  
[Anonymous], 2016, TERM FREQUENCY INVER
[7]  
[Anonymous], J BIOMED INFORM
[8]  
[Anonymous], 1992, P 4 C MESS UND MCLEA
[9]  
[Anonymous], 2016, SKLEARN METR NORM MU
[10]  
[Anonymous], 2016, SKLEARN GRID SEARCH