Leveraging electronic health records for data science: common pitfalls and how to avoid them

被引:57
作者
Sauer, Christopher M. [1 ,2 ]
Chen, Li-Ching [3 ]
Hyland, Stephanie L. [4 ]
Girbes, Armand [1 ]
Elbers, Paul [1 ]
Celi, Leo A. [2 ,5 ,6 ]
机构
[1] Amsterdam UMC, Locat VUmc, Lab Crit Care Computat Intelligence,Amsterdam Ins, Amsterdam Med Data Sci,Amsterdam Cardiovasc Sci,D, NL-1081 HV Amsterdam, Netherlands
[2] MIT, Lab Computat Physiol, Inst Med Engn & Sci, Cambridge, MA USA
[3] Natl Tsing Hua Univ, Dept Comp Sci, Hsinchu, Taiwan
[4] Microsoft Res, Cambridge, England
[5] Harvard TH Chan Sch Publ Hlth, Dept Biostat, Boston, MA USA
[6] Beth Israel Deaconess Med Ctr, Div Pulm Crit Care & Sleep Med, Boston, MA USA
来源
LANCET DIGITAL HEALTH | 2022年 / 4卷 / 12期
基金
美国国家卫生研究院;
关键词
INTERNATIONAL CONSENSUS DEFINITIONS; SELECTION BIAS; SEVERE SEPSIS; DATA QUALITY; CARE; EPIDEMIOLOGY; GUIDELINES; TRIALS;
D O I
10.1016/S2589-7500(22)00154-6
中图分类号
R-058 [];
学科分类号
摘要
Analysis of electronic health records (EHRs) is an increasingly common approach for studying real-world patient data. Use of routinely collected data offers several advantages compared with other study designs, including reduced administrative costs, the ability to update analysis as practice patterns evolve, and larger sample sizes. Methodologically, EHR analysis is subject to distinct challenges because data are not collected for research purposes. In this Viewpoint, we elaborate on the importance of in-depth knowledge of clinical workflows and describe six potential pitfalls to be avoided when working with EHR data, drawing on examples from the literature and our experience. We propose solutions for prevention or mitigation of factors associated with each of these six pitfalls-sample selection bias, imprecise variable definitions, limitations to deployment, variable measurement frequency, subjective treatment allocation, and model overfitting. Ultimately, we hope that this Viewpoint will guide researchers to further improve the methodological robustness of EHR analysis.
引用
收藏
页码:E893 / E898
页数:6
相关论文
共 72 条
[31]  
Johnson Alistair E.W., 2017, P 2 MACHINE LEARNING, V68, P361
[32]   Leakage in Data Mining: Formulation, Detection, and Avoidance [J].
Kaufman, Shachar ;
Rosset, Saharon ;
Perlich, Claudia ;
Stitelman, Ori .
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2012, 6 (04)
[33]   A review of causal inference for biomedical informatics [J].
Kleinberg, Samantha ;
Hripcsak, George .
JOURNAL OF BIOMEDICAL INFORMATICS, 2011, 44 (06) :1102-1112
[34]   Likelihood of infection in patients with presumed sepsis at the time of intensive care unit admission: a cohort study [J].
Klouwenberg, Peter M. C. Klein ;
Cremer, Olaf L. ;
van Vught, Lonneke A. ;
Ong, David S. Y. ;
Frencken, Jos F. ;
Schultz, Marcus J. ;
Bonten, Marc J. ;
van der Poll, Tom .
CRITICAL CARE, 2015, 19
[35]   The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care [J].
Komorowski, Matthieu ;
Celi, Leoa ;
Badawi, Omar ;
Gordon, Anthony C. ;
Faisal, A. Aldo .
NATURE MEDICINE, 2018, 24 (11) :1716-+
[36]  
Lanken PN, 1997, AM J RESP CRIT CARE, V156, P1282
[37]   Clinical actions and financial constraints: the limits to rationing intensive care [J].
Lapsley, I ;
Melia, K .
SOCIOLOGY OF HEALTH & ILLNESS, 2001, 23 (05) :729-746
[38]   The Framing of machine learning risk prediction models illustrated by evaluation of sepsis in general wards [J].
Lauritsen, Simon Meyer ;
Thiesson, Bo ;
Jorgensen, Marianne Johansson ;
Riis, Anders Hammerich ;
Espelund, Ulrick Skipper ;
Weile, Jesper Bo ;
Lange, Jeppe .
NPJ DIGITAL MEDICINE, 2021, 4 (01)
[39]   Usefulness of qSOFA and SIRS scores for detection of incipient sepsis in general ward patients: A prospective cohort study [J].
Luo, Jingchao ;
Jiang, Wei ;
Weng, Li ;
Peng, Jinmin ;
Hu, Xiaoyun ;
Wang, Chunyao ;
Liu, Guangyun ;
Huang, Huibin ;
Du, Bin .
JOURNAL OF CRITICAL CARE, 2019, 51 :13-18
[40]   The epidemiology of sepsis in the United States from 1979 through 2000 [J].
Martin, GS ;
Mannino, DM ;
Eaton, S ;
Moss, M .
NEW ENGLAND JOURNAL OF MEDICINE, 2003, 348 (16) :1546-1554