Natural language processing for the assessment of cardiovascular disease comorbidities: The cardio-Canary comorbidity project

被引:17
作者
Berman, Adam N. [1 ]
Biery, David W. [1 ]
Ginder, Curtis [2 ]
Hulme, Olivia L. [2 ]
Marcusa, Daniel [2 ]
Leiva, Orly [2 ]
Wu, Wanda Y. [1 ]
Cardin, Nicholas [3 ]
Hainer, Jon [4 ]
Bhatt, Deepak L. [1 ]
Di Carli, Marcelo F. [1 ,4 ]
Turchin, Alexander [3 ]
Blankstein, Ron [1 ,4 ]
机构
[1] Brigham & Womens Hosp, Harvard Med Sch, Dept Med, Cardiovasc Div, 75 Francis St, Boston, MA 02115 USA
[2] Brigham & Womens Hosp, Harvard Med Sch, Dept Med, Boston, MA 02115 USA
[3] Brigham & Womens Hosp, Harvard Med Sch, Dept Med, Div Endocrinol, Boston, MA 02215 USA
[4] Brigham & Womens Hosp, Harvard Med Sch, Dept Radiol, 75 Francis St, Boston, MA 02115 USA
关键词
cardiovascular comorbidities; natural language processing; ADMINISTRATIVE DATA; PERFORMANCE; RECORDS; STROKE;
D O I
10.1002/clc.23687
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Objective: Accurate ascertainment of comorbidities is paramount in clinical research. While manual adjudication is labor-intensive and expensive, the adoption of electronic health records enables computational analysis of free-text documentation using natural language processing (NLP) tools. Hypothesis: We sought to develop highly accurate NLP modules to assess for the presence of five key cardiovascular comorbidities in a large electronic health record system. Methods: One-thousand clinical notes were randomly selected from a cardiovascular registry at Mass General Brigham. Trained physicians manually adjudicated these notes for the following five diagnostic comorbidities: hypertension, dyslipidemia, diabetes, coronary artery disease, and stroke/transient ischemic attack. Using the open-source Canary NLP system, five separate NLP modules were designed based on 800 "training-set" notes and validated on 200 "test-set" notes. Results: Across the five NLP modules, the sentence-level and note-level sensitivity, specificity, and positive predictive value was always greater than 85% and was most often greater than 90%. Accuracy tended to be highest for conditions with greater diagnostic clarity (e.g. diabetes and hypertension) and slightly lower for conditions whose greater diagnostic challenges (e.g. myocardial infarction and embolic stroke) may lead to less definitive documentation. Conclusion: We designed five open-source and highly accurate NLP modules that can be used to assess for the presence of important cardiovascular comorbidities in free-text health records. These modules have been placed in the public domain and can be used for clinical research, trial recruitment and population management at any institution as well as serve as the basis for further development of cardiovascular NLP tools.
引用
收藏
页码:1296 / 1304
页数:9
相关论文
共 28 条
[1]   Big Data and Machine Learning in Health Care [J].
Beam, Andrew L. ;
Kohane, Isaac S. .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2018, 319 (13) :1317-1318
[2]   Accuracy of ICD-9-CM codes for identifying cardiovascular and stroke risk factors [J].
Birman-Deych, E ;
Waterman, AD ;
Yan, Y ;
Nilasena, DS ;
Radford, MJ ;
Gage, BF .
MEDICAL CARE, 2005, 43 (05) :480-485
[3]   Identifying hypertension-related comorbidities from administrative data: What's the optimal approach? [J].
Borzecki, AM ;
Wong, AT ;
Hickey, EC ;
Ash, AS ;
Berlowitz, DR .
AMERICAN JOURNAL OF MEDICAL QUALITY, 2004, 19 (05) :201-206
[4]   Automated Detection Using Natural Language Processing of Radiologists Recommendations for Additional Imaging of Incidental Findings [J].
Dutta, Sayon ;
Long, William J. ;
Brown, David F. M. ;
Reisner, Andrew T. .
ANNALS OF EMERGENCY MEDICINE, 2013, 62 (02) :162-169
[5]   Natural language processing and its future in medicine [J].
Friedman, C ;
Hripcsak, G .
ACADEMIC MEDICINE, 1999, 74 (08) :890-895
[6]   Risk Prediction With Electronic Health Records [J].
Goldstein, Benjamin A. ;
Navar, Ann Marie ;
Pencina, Michael J. .
JAMA CARDIOLOGY, 2016, 1 (09) :976-977
[7]   Research electronic data capture (REDCap)-A metadata-driven methodology and workflow process for providing translational research informatics support [J].
Harris, Paul A. ;
Taylor, Robert ;
Thielke, Robert ;
Payne, Jonathon ;
Gonzalez, Nathaniel ;
Conde, Jose G. .
JOURNAL OF BIOMEDICAL INFORMATICS, 2009, 42 (02) :377-381
[8]   HARVEST, a longitudinal patient record summarizer [J].
Hirsch, Jamie S. ;
Tanenbaum, Jessica S. ;
Gorman, Sharon Lipsky ;
Liu, Connie ;
Schmitz, Eric ;
Hashorva, Dritan ;
Ervits, Artem ;
Vawdrey, David ;
Sturm, Marc ;
Elhadad, Noemie .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2015, 22 (02) :263-274
[9]   The Promise of Electronic Records Around the Corner or Down the Road? [J].
Jha, Ashish K. .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2011, 306 (08) :880-881
[10]   Identification of methicillin-resistant Staphylococcus aureus within the Nation's Veterans Affairs Medical Centers using natural language processing [J].
Jones, Makoto ;
DuVall, Scott L. ;
Spuhl, Joshua ;
Samore, Matthew H. ;
Nielson, Christopher ;
Rubin, Michael .
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2012, 12