Using local lexicalized rules to identify heart disease risk factors in clinical notes

被引:20
作者
Karystianis, George [1 ,5 ]
Dehghan, Azad [1 ,5 ]
Kovacevic, Aleksandar [2 ]
Keane, John A. [1 ,4 ]
Nenadic, Goran [1 ,3 ,4 ]
机构
[1] Univ Manchester, Sch Comp Sci, Manchester, Lancs, England
[2] Univ Novi Sad, Fac Tech Sci, Novi Sad, Serbia
[3] Farr Inst Hlth Informat Res, Hlth eRes Ctr, Manchester, Lancs, England
[4] Univ Manchester, Manchester Inst Biotechnol, Manchester, Lancs, England
[5] Christie NHS Fdn Trust, Manchester, Lancs, England
关键词
Text mining; Risk factors; Heart disease; Vocabularies; Rule-based modelling;
D O I
10.1016/j.jbi.2015.06.013
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Heart disease is the leading cause of death globally and a significant part of the human population lives with it. A number of risk factors have been recognized as contributing to the disease, including obesity, coronary artery disease (CAD), hypertension, hyperlipidemia, diabetes, smoking, and family history of premature CAD. This paper describes and evaluates a methodology to extract mentions of such risk factors from diabetic clinical notes, which was a task of the i2b2/UTHealth 2014 Challenge in Natural Language Processing for Clinical Data. The methodology is knowledge-driven and the system implements local lexicalized rules (based on syntactical patterns observed in notes) combined with manually constructed dictionaries that characterize the domain. A part of the task was also to detect the time interval in which the risk factors were present in a patient. The system was applied to an evaluation set of 514 unseen notes and achieved a micro-average F-score of 88% (with 86% precision and 90% recall). While the identification of CAD family history, medication and some of the related disease factors (e.g. hypertension, diabetes, hyperlipidemia) showed quite good results, the identification of CAD-specific indicators proved to be more challenging (F-score of 74%). Overall, the results are encouraging and suggested that automated text mining methods can be used to process clinical notes to identify risk factors and monitor progression of heart disease on a large-scale, providing necessary data for clinical and epidemiological studies. (C) 2015 Elsevier Inc. All rights reserved.
引用
收藏
页码:S183 / S188
页数:6
相关论文
共 22 条
[1]  
Cohen WilliamW., 2004, Minorthird: Methods for identifying names and ontological relations in text using heuristics for inducing regularities from data
[2]   Extracting medical information from narrative patient records: the case of medication-related information [J].
Deleger, Louise ;
Grouin, Cyril ;
Zweigenbaum, Pierre .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2010, 17 (05) :555-558
[3]   Recognition of medication information from discharge summaries using ensembles of classifiers [J].
Doan, Son ;
Collier, Nigel ;
Xu, Hua ;
Pham Hoang Duy ;
Tu Minh Phuong .
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2012, 12
[4]  
Fiszman M., 2007, AMIA ANN S P, V2007
[5]   Automated encoding of clinical documents based on natural language processing [J].
Friedman, C ;
Shagina, L ;
Lussier, Y ;
Hripcsak, G .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2004, 11 (05) :392-402
[6]  
Goryachev S., 2008, AMIA ANN S P, V2008
[7]   Enhancing clinical concept extraction with distributional semantics [J].
Jonnalagadda, Siddhartha ;
Cohen, Trevor ;
Wu, Stephen ;
Gonzalez, Graciela .
JOURNAL OF BIOMEDICAL INFORMATICS, 2012, 45 (01) :129-140
[8]  
Kovacevic A., J AM MED INFORM ASSN, DOI [10.1136/amiajnl-2013-00, DOI 10.1136/AMIAJNL-2013-00]
[9]   High accuracy information extraction of medication information from clinical notes: 2009 i2b2 medication extraction challenge [J].
Patrick, Jon ;
Li, Min .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2010, 17 (05) :524-527
[10]   Automatic extraction of relations between medical concepts in clinical texts [J].
Rink, Bryan ;
Harabagiu, Sanda ;
Roberts, Kirk .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2011, 18 (05) :594-600