Identifying lupus patients in electronic health records: Development and validation of machine learning algorithms and application of rule-based algorithms

被引:54
作者
Jorge, April [1 ]
Castro, Victor M. [2 ]
Barnado, April [3 ]
Gainer, Vivian [2 ]
Hong, Chuan [4 ]
Cai, Tianxi [2 ,6 ]
Cai, Tianrun [5 ,6 ]
Carroll, Robert [7 ]
Denny, Joshua C. [7 ]
Crofford, Leslie [3 ]
Costenbader, Karen H. [5 ]
Liao, Katherine P. [5 ,6 ]
Karlson, Elizabeth W. [5 ]
Feldman, Candace H. [5 ]
机构
[1] Harvard Med Sch, Massachusetts Gen Hosp, Dept Med, Div Rheumatol Allergy & Immunol, 55 Fruit St,Bulfinch 165, Boston, MA 02114 USA
[2] Partners Healthcare, Res Informat Syst & Comp, Boston, MA USA
[3] Vanderbilt Univ, Med Ctr, Div Rheumatol & Immunol, Nashville, TN USA
[4] Harvard TH Chan Sch Publ Hlth, Boston, MA USA
[5] Harvard Med Sch, Brigham & Womens Hosp, Dept Med, Div Rheumatol Immunol & Allergy, Boston, MA 02115 USA
[6] Harvard Med Sch, Dept Biomed Informat, Boston, MA 02115 USA
[7] Vanderbilt Univ, Med Ctr, Dept Biomed Informat, Nashville, TN USA
关键词
Systemic lupus erythematosus; Bioinformatics; Electronic health records; Algorithms; CLASSIFICATION; EXTRACTION; CRITERIA;
D O I
10.1016/j.semarthrit.2019.01.002
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Objective: To utilize electronic health records (EHRs) to study SLE, algorithms are needed to accurately identify these patients. We used machine learning to generate data-driven SLE EHR algorithms and assessed performance of existing rule-based algorithms. Methods: We randomly selected subjects with >= 1 SLE ICD-9/10 codes from our EHR and identified gold standard definite and probable SLE cases by chart review, based on 1997 ACR or 2012 SLICC Classification Criteria. From a training set, we extracted coded and narrative concepts using natural language processing and generated algorithms using penalized logistic regression to classify definite or definite/probable SLE. We assessed predictive characteristics in internal and external cohort validations. We also tested performance characteristics of published rule-based algorithms with pre-specified permutations of ICD-9 codes, laboratory tests and medications in our EHR. Results: At a specificity of 97%, our machine learning coded algorithm for definite SLE had 90% positive predictive value (PPV) and 64% sensitivity and for definite/probable SLE, 92% PPV and 47% sensitivity. In the external validation, at 97% specificity, the definite/probable algorithm had 94% PPV and 60% sensitivity. Adding NLP concepts did not improve performance metrics. The PPVs of published rule-based algorithms ranged from 45-79% in our EHR. Conclusion: Our machine learning SLE algorithms performed well in internal and external validation. Rule based SLE algorithms did not transport as well to our EHR. Unique EHR characteristics, clinical practices and research goals regarding the desired sensitivity and specificity of the case definition must be considered when applying algorithms to identify SLE patients. (C) 2019 Elsevier Inc. All rights reserved.
引用
收藏
页码:84 / 90
页数:7
相关论文
共 21 条
[1]   Developing Electronic Health Record Algorithms That Accurately Identify Patients With Systemic Lupus Erythematosus [J].
Barnado, April ;
Casey, Carolyn ;
Carroll, Robert J. ;
Wheless, Lee ;
Denny, Joshua C. ;
Crofford, Leslie J. .
ARTHRITIS CARE & RESEARCH, 2017, 69 (05) :687-693
[2]   The Accuracy of Administrative Data Diagnoses of Systemic Autoimmune Rheumatic Diseases [J].
Bernatsky, Sasha ;
Linehan, Tina ;
Hanly, John G. .
JOURNAL OF RHEUMATOLOGY, 2011, 38 (08) :1612-1616
[3]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[4]   Portability of an algorithm to identify rheumatoid arthritis in electronic health records [J].
Carroll, Robert J. ;
Thompson, Will K. ;
Eyler, Anne E. ;
Mandelin, Arthur M. ;
Cai, Tianxi ;
Zink, Raquel M. ;
Pacheco, Jennifer A. ;
Boomershine, Chad S. ;
Lasko, Thomas A. ;
Xu, Hua ;
Karlson, Elizabeth W. ;
Perez, Raul G. ;
Gainer, Vivian S. ;
Murphy, Shawn N. ;
Ruderman, Eric M. ;
Pope, Richard M. ;
Plenge, Robert M. ;
Kho, Abel Ngo ;
Liao, Katherine P. ;
Denny, Joshua C. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2012, 19 (E1) :E162-E169
[5]   The Biobank Portal for Partners Personalized Medicine: A Query Tool for Working with Consented Biobank Samples, Genotypes, and Phenotypes Using i2b2 [J].
Gainer, Vivian S. ;
Cagan, Andrew ;
Castro, Victor M. ;
Duey, Stacey ;
Ghosh, Bhaswati ;
Goodson, Alyssa P. ;
Goryachev, Sergey ;
Metta, Reeta ;
Wang, Taowei David ;
Wattanasin, Nich ;
Murphy, Shawn N. .
JOURNAL OF PERSONALIZED MEDICINE, 2016, 6 (01)
[6]  
Goryachev S, 2006, P AMIA S, P931
[7]   Updating the American College of Rheumatology revised criteria for the classification of systemic lupus erythematosus [J].
Hochberg, MC .
ARTHRITIS AND RHEUMATISM, 1997, 40 (09) :1725-1725
[8]  
Jiang Min, 2014, AMIA Jt Summits Transl Sci Proc, V2014, P37
[9]   Development of phenotype algorithms using electronic medical records and incorporating natural language processing [J].
Liao, Katherine P. ;
Cai, Tianxi ;
Savova, Guergana K. ;
Murphy, Shawn N. ;
Karlson, Elizabeth W. ;
Ananthakrishnan, Ashwin N. ;
Gainer, Vivian S. ;
Shaw, Stanley Y. ;
Xia, Zongqi ;
Szolovits, Peter ;
Churchill, Susanne ;
Kohane, Isaac .
BMJ-BRITISH MEDICAL JOURNAL, 2015, 350
[10]   Electronic Medical Records for Discovery Research in Rheumatoid Arthritis [J].
Liao, Katherine P. ;
Cai, Tianxi ;
Gainer, Vivian ;
Goryachev, Sergey ;
Zeng-Treitler, Qing ;
Raychaudhuri, Soumya ;
Szolovits, Peter ;
Churchill, Susanne ;
Murphy, Shawn ;
Kohane, Isaac ;
Karlson, Elizabeth W. ;
Plenge, Robert M. .
ARTHRITIS CARE & RESEARCH, 2010, 62 (08) :1120-1127