Issues With Variability in Electronic Health Record Data About Race and Ethnicity: Descriptive Analysis of the National COVID Cohort Collaborative Data Enclave

被引:17
|
作者
Cook, Lily [1 ]
Espinoza, Juan [2 ]
Weiskopf, Nicole G. [1 ]
Mathews, Nisha [3 ]
Dorr, David A. [1 ]
Gonzales, Kelly L. [4 ,5 ,6 ,7 ]
Wilcox, Adam [8 ]
Madlock-Brown, Charisse [9 ]
机构
[1] Oregon Hlth & Sci Univ, Sch Med, Dept Med Informat & Clin Epidemiol, Portland, OR 97201 USA
[2] Childrens Hosp Los Angeles, Dept Pediat, Los Angeles, CA 90027 USA
[3] Univ Houston, Coll Human Sci & Humanities, Clear Lake Pearland, TX USA
[4] Citizen Cherokee Nation, Portland, OR USA
[5] Portland State Univ, Oregon Hlth & Sci Univ, Joint Sch Publ Hlth, Portland, OR 97207 USA
[6] BIPOC Decolonizing Data Council, Portland, OR USA
[7] Indigenous Equ Inst, Portland, OR USA
[8] Washington Univ, Dept Med, Inst Informat, St Louis, MO USA
[9] Univ Tennessee, Ctr Hlth Sci, Tennessee Clin & Translat Sci Inst, Memphis, TN 38163 USA
基金
美国国家卫生研究院;
关键词
social determinants of health; health equity; bias; data quality; data harmonization; data standards; terminology; data aggregation; ACCURACY; QUALITY; STATES;
D O I
10.2196/39235
中图分类号
R-058 [];
学科分类号
摘要
Background: The adverse impact of COVID-19 on marginalized and under-resourced communities of color has highlighted the need for accurate, comprehensive race and ethnicity data. However, a significant technical challenge related to integrating race and ethnicity data in large, consolidated databases is the lack of consistency in how data about race and ethnicity are collected and structured by health care organizations. Objective: This study aims to evaluate and describe variations in how health care systems collect and report information about the race and ethnicity of their patients and to assess how well these data are integrated when aggregated into a large clinical database. Methods: At the time of our analysis, the National COVID Cohort Collaborative (N3C) Data Enclave contained records from 6.5 million patients contributed by 56 health care institutions. We quantified the variability in the harmonized race and ethnicity data in the N3C Data Enclave by analyzing the conformance to health care standards for such data. We conducted a descriptive analysis by comparing the harmonized data available for research purposes in the database to the original source data contributed by health care institutions. To make the comparison, we tabulated the original source codes, enumerating how many patients had been reported with each encoded value and how many distinct ways each category was reported. The nonconforming data were also cross tabulated by 3 factors: patient ethnicity, the number of data partners using each code, and which data models utilized those particular encodings. For the nonconforming data, we used an inductive approach to sort the source encodings into categories. For example, values such as "Declined" were grouped with "Refused," and "Multiple Race" was grouped with "Two or more races" and "Multiracial." Results: "No matching concept" was the second largest harmonized concept used by the N3C to describe the race of patients in their database. In addition, 20.7% of the race data did not conform to the standard; the largest category was data that were missing. Hispanic or Latino patients were overrepresented in the nonconforming racial data, and data from American Indian or Alaska Native patients were obscured. Although only a small proportion of the source data had not been mapped to the correct concepts (0.6%), Black or African American and Hispanic/Latino patients were overrepresented in this category. Conclusions: Differences in how race and ethnicity data are conceptualized and encoded by health care institutions can affect the quality of the data in aggregated clinical databases. The impact of data quality issues in the N3C Data Enclave was not equal across all races and ethnicities, which has the potential to introduce bias in analyses and conclusions drawn from these data. Transparency about how data have been transformed can help users make accurate analyses and inferences and eventually better guide clinical care and public policy.
引用
收藏
页数:13
相关论文
共 28 条
  • [21] Establishing a National Cardiovascular Disease Surveillance System in the United States Using Electronic Health Record Data: Key Strengths and Limitations
    Williams, Brent A.
    Voyce, Stephen
    Sidney, Stephen
    Roger, Veronique L.
    Plante, Timothy B.
    Larson, Sharon
    LaMonte, Michael J.
    Labarthe, Darwin R.
    DeBarmore, Bailey M.
    Chang, Alexander R.
    Chamberlain, Alanna M.
    Benziger, Catherine P.
    JOURNAL OF THE AMERICAN HEART ASSOCIATION, 2022, 11 (08):
  • [22] Collaborative Care and Healthcare Usage in Families with Pediatric Patients During COVID-19: A Secondary Analysis of National Survey of Children's Health (NSCH) Data
    Wu, Qiwei Luna
    Brannon, Grace Ellen
    HEALTH COMMUNICATION, 2024, 39 (05) : 1053 - 1065
  • [23] Social Determinants and Military Veterans’ Suicide Ideation and Attempt: a Cross-sectional Analysis of Electronic Health Record Data
    John R. Blosnich
    Ann Elizabeth Montgomery
    Melissa E. Dichter
    Adam J. Gordon
    Dio Kavalieratos
    Laura Taylor
    Bryan Ketterer
    Robert M. Bossarte
    Journal of General Internal Medicine, 2020, 35 : 1759 - 1767
  • [24] Evaluation of data completeness in the electronic health record for the purpose of patient recruitment into clinical trials: a retrospective analysis of element presence
    Felix Köpcke
    Benjamin Trinczek
    Raphael W Majeed
    Björn Schreiweis
    Joachim Wenk
    Thomas Leusch
    Thomas Ganslandt
    Christian Ohmann
    Björn Bergh
    Rainer Röhrig
    Martin Dugas
    Hans-Ulrich Prokosch
    BMC Medical Informatics and Decision Making, 13
  • [25] Social Determinants and Military Veterans' Suicide Ideation and Attempt: a Cross-sectional Analysis of Electronic Health Record Data
    Blosnich, John R.
    Montgomery, Ann Elizabeth
    Dichter, Melissa E.
    Gordon, Adam J.
    Kavalieratos, Dio
    Taylor, Laura
    Ketterer, Bryan
    Bossarte, Robert M.
    JOURNAL OF GENERAL INTERNAL MEDICINE, 2020, 35 (06) : 1759 - 1767
  • [26] Addressing common sources of bias in studies of new-onset type 2 diabetes following COVID that use electronic health record data
    Harding, Jessica L.
    Pfaff, Emily
    Boyko, Edward
    Wander, Pandora L.
    DIABETES EPIDEMIOLOGY AND MANAGEMENT, 2024, 14
  • [27] Associations of County-Level Social Determinants of Health with COVID-19 Related Hospitalization Among People with HIV: A Retrospective Analysis of the US National COVID Cohort Collaborative (N3C)
    Islam, Jessica Y.
    Hurwitz, Eric
    Li, Dongmei
    Camacho-Rivera, Marlene
    Sun, Jing
    Safo, Sandra
    Ross, Jennifer M.
    Wilkins, Kenneth
    Hassan, Shukri
    Hill, Elaine L.
    Nosyk, Bohdan
    Varley, Cara
    Fadul, Nada
    Madlock-Brown, Charisse
    Patel, Rena C.
    AIDS AND BEHAVIOR, 2024, 28 (SUPPL 1) : 136 - 148
  • [28] Assess and validate predictive performance of models for in-hospital mortality in COVID-19 patients: A retrospective cohort study in the Netherlands comparing the value of registry data with high-granular electronic health records
    Vagliano, Iacopo
    Schut, Martijn C.
    Abu-Hanna, Ameen
    Dongelmans, Dave A.
    de Lange, Dylan W.
    Gommers, Diederik
    Cremer, Olaf L.
    Bosman, Rob J.
    Rigter, Sander
    Wils, Evert-Jan
    Frenzel, Tim
    de Jong, Remko
    Peters, Marco A. A.
    Kamps, Marlijn J. A.
    Ramnarain, Dharmanand
    Nowitzky, Ralph
    Nooteboom, Fleur G. C. A.
    de Ruijter, Wouter
    Urlings-Strop, Louise C.
    Smit, Ellen G. M.
    Mehagnoul-Schipper, D. Jannet
    Dormans, Tom
    de Jager, Cornelis P. C.
    Hendriks, Stefaan H. A.
    Achterberg, Sefanja
    Oostdijk, Evelien
    Reidinga, Auke C.
    Festen-Spanjer, Barbara
    Brunnekreef, Gert B.
    Cornet, Alexander D.
    van den Tempel, Walter
    Boelens, Age D.
    Koetsier, Peter
    Lens, Judith
    Faber, Harald J.
    Karakus, A.
    Entjes, Robert
    de Jong, Paul
    Rettig, Thijs C. D.
    Reuland, M. C.
    Arbous, Sesmu
    Fleuren, Lucas M.
    Dam, Tariq A.
    Thoral, Patrick J.
    Lalisang, Robbert C. A.
    Tonutti, Michele
    de Bruin, Daan P.
    Elbers, Paul W. G.
    de Keizer, Nicolette F.
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2022, 167