Estimating Missing Data in Temporal Data Streams Using Multi-Directional Recurrent Neural Networks

被引:173
作者
Yoon, Jinsung [1 ]
Zame, William R. [2 ]
van der Schaar, Mihaela [3 ,4 ]
机构
[1] Univ Calif Los Angeles, Dept Elect & Comp Engn, Los Angeles, CA 90095 USA
[2] Univ Calif Los Angeles, Dept Econ & Math, Los Angeles, CA USA
[3] Univ Oxford, Dept Engn Sci, Oxford, England
[4] Alan Turing Inst, London, England
基金
美国国家科学基金会;
关键词
Missing data; temporal data streams; imputation; recurrent neural nets; MULTIPLE-IMPUTATION;
D O I
10.1109/TBME.2018.2874712
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Missing data is a ubiquitous problem. It is especially challenging in medical settings because many streams of measurements are collected at different-and often irregular-times. Accurate estimation of the missing measurements is critical for many reasons, including diagnosis, prognosis, and treatment. Existing methods address this estimation problem by interpolating within data streams or imputing across data streams (both of which ignore important information) or ignoring the temporal aspect of the data and imposing strong assumptions about the nature of the data-generating process and/or the pattern of missing data (both of which are especially problematic for medical data). We propose a new approach, based on a novel deep learning architecture that we call a Multi-directional Recurrent Neural Network that interpolates within data streams and imputes across data streams. We demonstrate the power of our approach by applying it to five real-world medical datasets. We show that it provides dramatically improved estimation of missing measurements in comparison to 11 state-of-the-art benchmarks (including Spline and Cubic Interpolations, MICE, MissForest, matrix completion, and several RNN methods); typical improvements in Root Mean Squared Error are between 35%-50%. Additional experiments based on the same five datasets demonstrate that the improvements provided by our method are extremely robust.
引用
收藏
页码:1477 / 1490
页数:14
相关论文
共 34 条
[1]  
Alaa AM, 2017, PR MACH LEARN RES, V70
[2]   Personalized Risk Scoring for Critical Care Prognosis Using Mixtures of Gaussian Processes [J].
Alaa, Ahmed M. ;
Yoon, Jinsung ;
Hu, Scott ;
van der Schaar, Mihaela .
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2018, 65 (01) :207-218
[3]  
[Anonymous], REP
[4]  
[Anonymous], NEURAL COMPUT APPL
[5]  
[Anonymous], STAT MED, DOI DOI 10.1002/SIM.4067
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]  
[Anonymous], 2004, MULTIPLE IMPUTATION
[8]  
[Anonymous], J INTERNAL MED
[9]  
[Anonymous], ARXIV170502737
[10]  
[Anonymous], STAT MED