Novel Data Imputation for Multiple Types of Missing Data in Intensive Care Units

被引:16
作者
Venugopalan, Janani [1 ]
Chanani, Nikhil [2 ]
Maher, Kevin [2 ]
Wang, May D. [1 ]
机构
[1] Emory Univ, Georgia Inst Technol, Wallace H Coulter Sch Biomed Engn, Atlanta, GA 30322 USA
[2] Emory Univ, Pediat Dept, Atlanta, GA 30322 USA
基金
美国国家卫生研究院;
关键词
Clinical risk prediction; data imputation; intensive care units; missing data; quality control; LENGTH-OF-STAY; CLINICAL-TRIALS; MORTALITY; PATTERNS; MODELS; SYSTEM; MOTION;
D O I
10.1109/JBHI.2018.2883606
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The diversity and number of parameters monitored in an intensive care unit (ICU) make the resulting databases highly susceptible to quality issues, such as missing information and erroneous data entry, which adversely affect the downstream processing and predictive modeling. Missing data interpolation and imputation techniques, such as multiple imputation, expectation maximization, and hot-deck imputation techniques do not account for the type of missing data, which can lead to bias. In our study, we first model the missing data as three types: "neglectable" also known as a.k.a "missing completely at random," "recoverable" a.k.a. "missing at random," and " not easily recoverable" a.k.a. "missing not at random." We then design imputation techniques for each type of missing data. We use a publicly available database (MIMIC II) to demonstrate how these imputations perform with random forests for prediction. Our results indicate that these novel imputation techniques outperformed standard mean filling techniques and expectation maximization with a statistical significance p <= 0.01 in predicting ICU mortality.
引用
收藏
页码:1243 / 1250
页数:8
相关论文
共 59 条
[1]   A Review of Hot Deck Imputation for Survey Non-response [J].
Andridge, Rebecca R. ;
Little, Roderick J. A. .
INTERNATIONAL STATISTICAL REVIEW, 2010, 78 (01) :40-64
[2]  
[Anonymous], CRIT CARE MED, V40, P952, DOI [10.1097/CCM.0b013-3182373157, DOI 10.1097/CCM.0B013E31820A92C6, DOI 10.1097/CCM.0B013-3182373157]
[3]   A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm [J].
Aydilek, Ibrahim Berkan ;
Arslan, Ahmet .
INFORMATION SCIENCES, 2013, 233 :25-35
[4]   Dexamethasone in patients with acute lung injury from acute monocytic leukaemia [J].
Azoulay, E. ;
Canet, E. ;
Raffoux, E. ;
Lengline, E. ;
Lemiale, V. ;
Vincent, F. ;
de Labarthe, A. ;
Seguin, A. ;
Boissel, N. ;
Dombret, H. ;
Schlemmer, B. .
EUROPEAN RESPIRATORY JOURNAL, 2012, 39 (03) :648-653
[5]   Learning motion patterns of people for compliant robot motion [J].
Bennewitz, M ;
Burgard, W ;
Cielniak, G ;
Thrun, S .
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2005, 24 (01) :31-48
[6]   Preliminary evidence for persistent abnormalities in amygdala volumes in adolescents and young adults with bipolar disorder [J].
Blumberg, HP ;
Fredericks, C ;
Wang, F ;
Kalmar, JH ;
Spencer, L ;
Papademetris, X ;
Pittman, B ;
Martin, A ;
Peterson, BS ;
Fulbright, RK ;
Krystal, JH .
BIPOLAR DISORDERS, 2005, 7 (06) :570-576
[7]  
Botsis Taxiarchis, 2010, Summit Transl Bioinform, V2010, P1
[8]  
Bouy├ E., 2000, COPULAS FINANCE READ, DOI 10.2139/ssrn.1032533
[9]  
Chen Y, 2014, IEEE ENG MED BIO, P4310, DOI 10.1109/EMBC.2014.6944578
[10]   Using EHR data to predict hospital-acquired pressure ulcers: A prospective study of a Bayesian Network model [J].
Cho, Insook ;
Park, Ihnsook ;
Kim, Eunman ;
Lee, Eunjoon ;
Bates, David W. .
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2013, 82 (11) :1059-1067