An exploratory data quality analysis of time series physiologic signals using a large-scale intensive care unit database

被引:7
作者
Afshar, Ali S. [1 ]
Li, Yijun [2 ]
Chen, Zixu [3 ]
Chen, Yuxuan [3 ]
Lee, Jae Hun [3 ]
Irani, Darius [3 ]
Crank, Aidan [3 ]
Singh, Digvijay [3 ]
Kanter, Michael [4 ]
Faraday, Nauder [1 ]
Kharrazi, Hadi [5 ,6 ]
机构
[1] Johns Hopkins Sch Med, Dept Anesthesiol & Crit Care Med, 600 N Wolfe St, Baltimore, MD 21205 USA
[2] Johns Hopkins Bloomberg Sch Publ Hlth, Dept Epidemiol, Baltimore, MD USA
[3] Johns Hopkins Whiting Sch Engn, Dept Comp Sci, Baltimore, MD USA
[4] Kaiser Permanente, Bernard J Tyson Sch Med, Dept Clin Sci, Pasadena, CA USA
[5] Johns Hopkins Bloomberg Sch Publ Hlth, Dept Hlth Policy & Management, Baltimore, MD USA
[6] Johns Hopkins Sch Med, Div Hlth Sci Informat, Baltimore, MD 21205 USA
关键词
physiologic monitoring; data quality; intensive care unit; ELECTRONIC HEALTH RECORDS; COMPLETENESS;
D O I
10.1093/jamiaopen/ooab057
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Physiological data, such as heart rate and blood pressure, are critical to clinical decision-making in the intensive care unit (ICU). Vital signs data, which are available from electronic health records, can be used to diagnose and predict important clinical outcomes; While there have been some reports on the data quality of nurse-verified vital sign data, little has been reported on the data quality of higher frequency time-series vital signs acquired in ICUs, that would enable such predictive modeling. In this study, we assessed the data quality issues, defined as the completeness, accuracy, and timeliness, of minute-by-minute time series vital signs data within the MIMIC-III data set, captured from 16009 patient-ICU stays and corresponding to 9410 unique adult patients. We measured data quality of four time-series vital signs data streams in the MIMIC-III data set: heart rate (HR), respiratory rate (RR), blood oxygen saturation (SpO2), and arterial blood pressure (ABP). Approximately, 30% of patient-ICU stays did not have at least 1 min of data during the time-frame of the ICU stay for HR, RR, and SpO2. The percentage of patient-ICU stays that did not have at least 1 min of ABP data was similar to 56%. We observed similar to 80% coverage of the total duration of the ICU stay for HR, RR, and SpO2. Finally, only 12.5%%, 9.9%, 7.5%, and 4.4% of ICU lengths of stay had >= 99% data available for HR, RR, SpO2, and ABP, respectively, that would meet the three data quality requirements we looked into in this study. Our findings on data completeness, accuracy, and timeliness have important implications for data scientists and informatics researchers who use time series vital signs data to develop predictive models of ICU outcomes.
引用
收藏
页数:6
相关论文
共 19 条
[1]   The completeness of the Swedish Cancer Register - a sample survey for year 1998 [J].
Barlow, Lotti ;
Westergren, Kerstin ;
Holmberg, Lars ;
Talback, Mats .
ACTA ONCOLOGICA, 2009, 48 (01) :27-33
[2]  
Genes N, 2013, Open Med Inform J, V7, P34, DOI 10.2174/1874431101307010034
[3]   Artificial Intelligence in the Intensive Care Unit [J].
Gutierrez, Guillermo .
CRITICAL CARE, 2020, 24 (01)
[4]   Early prediction of circulatory failure in the intensive care unit using machine learning [J].
Hyland, Stephanie L. ;
Faltys, Martin ;
Huser, Matthias ;
Lyu, Xinrui ;
Gumbsch, Thomas ;
Esteban, Cristobal ;
Bock, Christian ;
Horn, Max ;
Moor, Michael ;
Rieck, Bastian ;
Zimmermann, Marc ;
Bodenham, Dean ;
Borgwardt, Karsten ;
Ratsch, Gunnar ;
Merz, Tobias M. .
NATURE MEDICINE, 2020, 26 (03) :364-+
[5]   MIMIC-III, a freely accessible critical care database [J].
Johnson, Alistair E. W. ;
Pollard, Tom J. ;
Shen, Lu ;
Lehman, Li-wei H. ;
Feng, Mengling ;
Ghassemi, Mohammad ;
Moody, Benjamin ;
Szolovits, Peter ;
Celi, Leo Anthony ;
Mark, Roger G. .
SCIENTIFIC DATA, 2016, 3
[6]   Prospective EHR-Based Clinical Trials: The Challenge of Missing Data [J].
Kharrazi, Hadi ;
Wang, Chenguang ;
Scharfstein, Daniel .
JOURNAL OF GENERAL INTERNAL MEDICINE, 2014, 29 (07) :976-978
[7]  
Lamberg Anna L, 2010, Clin Epidemiol, V2, P123
[8]   A New Paradigm to Analyze Data Completeness of Patient Data [J].
Nasir, Ayan ;
Gurupur, Varadraj ;
Liu, Xinliang .
APPLIED CLINICAL INFORMATICS, 2016, 7 (03) :745-764
[9]  
PhysioNet, OV DAT
[10]  
R Core Team, R: A Language and Environment for Statistical Computing