External Validation of an Algorithm to Identify Patients with High Data-Completeness in Electronic Health Records for Comparative Effectiveness Research

被引:21
作者
Lin, Kueiyu Joshua [1 ,2 ]
Rosenthal, Gary E. [3 ]
Murphy, Shawn N. [4 ,5 ]
Mandl, Kenneth D. [6 ]
Jin, Yinzhu [1 ]
Glynn, Robert J. [1 ]
Schneeweiss, Sebastian [1 ]
机构
[1] Harvard Med Sch, Brigham & Womens Hosp, Dept Med, Div Pharmacoepidemiol & Pharmacoecon, 1620 Tremont St Suite 3030, Boston, MA 02120 USA
[2] Harvard Med Sch, Massachusetts Gen Hosp, Dept Med, Boston, MA 02120 USA
[3] Wake Forest Sch Med, Dept Internal Med, Winston Salem, NC 27101 USA
[4] Harvard Med Sch, Massachusetts Gen Hosp, Dept Neurol, Boston, MA 02120 USA
[5] Partners Healthcare, Res Informat Sci & Comp, Somerville, MA USA
[6] Harvard Med Sch, Boston Childrens Hosp, Computat Hlth Informat Program, Boston, MA 02120 USA
来源
CLINICAL EPIDEMIOLOGY | 2020年 / 12卷
关键词
electronic medical records; data linkage; comparative effectiveness research; information bias; continuity; external validation; DATA INFRASTRUCTURE; CODES; VALIDITY;
D O I
10.2147/CLEP.S232540
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Objective: Electronic health records (EHR) data-discontinuity, i.e. receiving care outside of a particular EHR system, may cause misclassification of study variables. We aimed to validate an algorithm to identify patients with high EHR data-continuity to reduce such bias. Materials and Methods: We analyzed data from two EHR systems linked with Medicare claims data from 2007 through 2014, one in Massachusetts (MA, n=80,588) and the other in North Carolina (NC, n=33,207). We quantified EHR data-continuity by Mean Proportion of Encounters Captured (MPEC) by the EHR system when compared to complete recording in claims data. The prediction model for MPEC was developed in MA and validated in NC. Stratified by predicted EHR data-continuity, we quantified misclassification of 40 key variables by Mean Standardized Differences (MSD) between the proportions of these variables based on EHR alone vs the linked claims-EHR data. Results: The mean MPEC was 27% in the MA and 26% in the NC system. The predicted and observed EHR data-continuity was highly correlated (Spearman correlation=0.78 and 0.73, respectively). The misclassification (MSD) of 40 variables in patients of the predicted EHR data-continuity cohort was significantly smaller (44%, 95% CI: 40-48%) than that in the remaining population. Discussion: The comorbidity profiles were similar in patients with high vs low EHR data-continuity. Therefore, restricting an analysis to patients with high EHR data-continuity may reduce information bias while preserving the representativeness of the study cohort. Conclusion: We have successfully validated an algorithm that can identify a high EHR data-continuity cohort representative of the source population.
引用
收藏
页码:133 / 141
页数:9
相关论文
共 20 条
  • [1] A systematic review of validated methods for identifying cerebrovascular accident or transient ischemic attack using administrative data
    Andrade, Susan E.
    Harrold, Leslie R.
    Tjia, Jennifer
    Cutrona, Sarah L.
    Saczynski, Jane S.
    Dodd, Katherine S.
    Goldberg, Robert J.
    Gurwitz, Jerry H.
    [J]. PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2012, 21 : 100 - 128
  • [2] Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples
    Austin, Peter C.
    [J]. STATISTICS IN MEDICINE, 2009, 28 (25) : 3083 - 3107
  • [3] Accuracy of ICD-9-CM codes for identifying cardiovascular and stroke risk factors
    Birman-Deych, E
    Waterman, AD
    Yan, Y
    Nilasena, DS
    Radford, MJ
    Gage, BF
    [J]. MEDICAL CARE, 2005, 43 (05) : 480 - 485
  • [4] Building Data Infrastructure to Evaluate and Improve Quality: PCORnet
    Corley, Douglas A.
    Feigelson, Heather Spencer
    Lieu, Tracy A.
    McGlynn, Elizabeth A.
    [J]. JOURNAL OF ONCOLOGY PRACTICE, 2015, 11 (03) : 204 - +
  • [5] An automated database case definition for serious bleeding related to oral anticoagulant use
    Cunningham, Andrew
    Stein, C. Michael
    Chung, Cecilia P.
    Daugherty, James R.
    Smalley, Walter E.
    Ray, Wayne A.
    [J]. PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2011, 20 (06) : 560 - 566
  • [6] Deep vein thrombosis and pulmonary embolism in two cohorts: The longitudinal investigation of thromboembolism etiology
    Cushman, M
    Tsai, AW
    White, RH
    Heckbert, SR
    Rosamond, WD
    Enright, P
    Folsom, AR
    [J]. AMERICAN JOURNAL OF MEDICINE, 2004, 117 (01) : 19 - 25
  • [7] Safety of low dose glucocorticoid treatment in rheumatoid arthritis:: published evidence and prospective trial data
    Da Silva, JAP
    Jacobs, JWG
    Kirwan, JR
    Boers, M
    Saag, KG
    Inês, LBS
    de Koning, EJP
    Buttgereit, F
    Cutolo, M
    Capell, H
    Rau, R
    Bijlsma, JWJ
    [J]. ANNALS OF THE RHEUMATIC DISEASES, 2006, 65 (03) : 285 - 293
  • [8] Validity of Using Inpatient and Outpatient Administrative Codes to Identify Acute Venous Thromboembolism: The CVRN VTE Study
    Fang, Margaret C.
    Fan, Dongjie
    Sung, Sue Hee
    Witt, Daniel M.
    Schmelzer, John R.
    Steinhubl, Steven R.
    Yale, Steven H.
    Go, Alan S.
    [J]. MEDICAL CARE, 2017, 55 (12) : E137 - E143
  • [9] Metrics for covariate balance in cohort studies of causal effects
    Franklin, Jessica M.
    Rassen, Jeremy A.
    Ackermann, Diana
    Bartels, Dorothee B.
    Schneeweiss, Sebastian
    [J]. STATISTICS IN MEDICINE, 2014, 33 (10) : 1685 - 1699
  • [10] A combined comorbidity score predicted mortality in elderly patients better than existing scores
    Gagne, Joshua J.
    Glynn, Robert J.
    Avorn, Jerry
    Levin, Raisa
    Schneeweiss, Sebastian
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2011, 64 (07) : 749 - 759