Combining Fourier and lagged k-nearest neighbor imputation for biomedical time series data

被引:45
作者
Rahman, Shah Atiqur [1 ]
Huang, Yuxiao [1 ]
Claassen, Jan [2 ]
Heintzman, Nathaniel [3 ]
Kleinberg, Samantha [1 ]
机构
[1] Stevens Inst Technol, Dept Comp Sci, Hoboken, NJ 07030 USA
[2] Columbia Univ, Coll Phys & Surg, Div Crit Care Neurol, Dept Neurol, New York, NY USA
[3] Dexcom Inc, San Diego, CA USA
关键词
Missing data; Imputation; Time series; Biomedical data; MISSING DATA; MULTIPLE IMPUTATION; VALUES;
D O I
10.1016/j.jbi.2015.10.004
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Most clinical and biomedical data contain missing values. A patient's record may be split across multiple institutions, devices may fail, and sensors may not be worn at all times. While these missing values are often ignored, this can lead to bias and error when the data are mined. Further, the data are not simply missing at random. Instead the measurement of a variable such as blood glucose may depend on its prior values as well as that of other variables. These dependencies exist across time as well, but current methods have yet to incorporate these temporal relationships as well as multiple types of missingness. To address this, we propose an imputation method (FLk-NN) that incorporates time lagged correlations both within and across variables by combining two imputation methods, based on an extension to k-NN and the Fourier transform. This enables imputation of missing values even when all data at a time point is missing and when there are different types of missingness both within and across variables. In comparison to other approaches on three biological datasets (simulated and actual Type 1 diabetes datasets, and multi-modality neurological ICU monitoring) the proposed method has the highest imputation accuracy. This was true for up to half the data being missing and when consecutive missing values are a significant fraction of the overall time series length. (C) 2015 Elsevier Inc. All rights reserved.
引用
收藏
页码:198 / 207
页数:10
相关论文
共 39 条
[1]  
Allison PD, 2001, Missing Data
[2]  
[Anonymous], 2004, The Analysis of Time Series. An Introduction
[3]  
[Anonymous], PATTERN RECOGNITION
[4]  
[Anonymous], 1987, MULTIPLE IMPUTATION
[5]   Handling missing data in RCTs; a review of the top medical journals [J].
Bell, Melanie L. ;
Fiero, Mallorie ;
Horton, Nicholas J. ;
Hsu, Chiu-Hsieh .
BMC MEDICAL RESEARCH METHODOLOGY, 2014, 14
[6]   Nonconvulsive Seizures after Subarachnoid Hemorrhage: Multimodal Detection and Outcomes [J].
Claassen, Jan ;
Perotte, Adler ;
Albers, David ;
Kleinberg, Samantha ;
Schmidt, J. Michael ;
Tu, Bin ;
Badjatia, Neeraj ;
Lantigua, Hector ;
Hirsch, Lawrence J. ;
Mayer, Stephan A. ;
Connolly, E. Sander ;
Hripcsak, George .
ANNALS OF NEUROLOGY, 2013, 74 (01) :53-64
[7]   A COMPARISON OF ANALYTIC METHODS FOR NONRANDOM MISSINGNESS OF OUTCOME DATA [J].
CRAWFORD, SL ;
TENNSTEDT, SL ;
MCKINLAY, JB .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 1995, 48 (02) :209-219
[8]   Bias due to missing exposure data using complete-case analysis in the proportional hazards regression model [J].
Demissie, S ;
LaValley, MP ;
Horton, NJ ;
Glynn, RJ ;
Cupples, LA .
STATISTICS IN MEDICINE, 2003, 22 (04) :545-557
[9]   A novel framework for imputation of missing values in databases [J].
Farhangfar, Alireza ;
Kurgan, Lukasz A. ;
Pedrycz, Witold .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2007, 37 (05) :692-709
[10]  
Feupe Stephanie Feudjio, 2013, J Diabetes Sci Technol, V7, P1337