Deep imputation of missing values in time series health data: A review with benchmarking

被引:13
|
作者
Kazijevs, Maksims [1 ]
Samad, Manar D. [1 ]
机构
[1] Tennessee State Univ, Dept Comp Sci, Nashville, TN 37209 USA
基金
美国国家卫生研究院;
关键词
Time series; Multivariate data; Longitudinal imputation; Cross-sectional imputation; Missing value imputation; Deep neural network; Electronic health records; Sensor data;
D O I
10.1016/j.jbi.2023.104440
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The imputation of missing values in multivariate time series (MTS) data is critical in ensuring data quality and producing reliable data-driven predictive models. Apart from many statistical approaches, a few recent studies have proposed state-of-the-art deep learning methods to impute missing values in MTS data. However, the evaluation of these deep methods is limited to one or two data sets, low missing rates, and completely random missing value types. This survey performs six data-centric experiments to benchmark state-of-the-art deep imputation methods on five time series health data sets. Our extensive analysis reveals that no single imputation method outperforms the others on all five data sets. The imputation performance depends on data types, individual variable statistics, missing value rates, and types. Deep learning methods that jointly perform cross-sectional (across variables) and longitudinal (across time) imputations of missing values in time series data yield statistically better data quality than traditional imputation methods. Although computationally expensive, deep learning methods are practical given the current availability of high-performance computing resources, especially when data quality and sample size are of paramount importance in healthcare informatics. Our findings highlight the importance of data-centric selection of imputation methods to optimize data-driven predictive models.
引用
收藏
页数:18
相关论文
共 50 条
  • [11] Selective Imputation for Multivariate Time Series Datasets With Missing Values
    Blazquez-Garcia, Anehd
    Wickstrom, Kristoffer
    Yu, Shujian
    Mikalsen, Karl Oyvind
    Boubekki, Ahcene
    Conde, Angel
    Mori, Usue
    Jenssen, Robert
    Lozano, Jose A.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (09) : 9490 - 9501
  • [12] Method of missing data imputation for multivariate time series
    Li Z.
    Zhang F.
    Wang Y.
    Tao Q.
    Li C.
    2018, Chinese Institute of Electronics (40): : 225 - 230
  • [13] Missing Data Imputation in Time Series of Air Pollution
    Junger, Washington
    de Leon, Antonio Ponce
    EPIDEMIOLOGY, 2009, 20 (06) : S87 - S87
  • [14] Imputation of missing data in time series for air pollutants
    Junger, W. L.
    de Leon, A. Ponce
    ATMOSPHERIC ENVIRONMENT, 2015, 102 : 96 - 104
  • [15] A missing values imputation method for time series data: an efficient method to investigate the health effects of sulphur dioxide levels
    Weerasinghe, Swarna
    ENVIRONMETRICS, 2010, 21 (02) : 162 - 172
  • [16] Missing Data Imputation in Time Series by Evolutionary Algorithms
    Figueroa Garcia, Juan C.
    Kalenatic, Dusko
    Lopez Bello, Cesar Amilcar
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, PROCEEDINGS: WITH ASPECTS OF ARTIFICIAL INTELLIGENCE, 2008, 5227 : 275 - +
  • [17] Imputation of missing values for electronic health record laboratory data
    Jiang Li
    Xiaowei S. Yan
    Durgesh Chaudhary
    Venkatesh Avula
    Satish Mudiganti
    Hannah Husby
    Shima Shahjouei
    Ardavan Afshar
    Walter F. Stewart
    Mohammed Yeasin
    Ramin Zand
    Vida Abedi
    npj Digital Medicine, 4
  • [18] Imputation of missing values for electronic health record laboratory data
    Li, Jiang
    Yan, Xiaowei S.
    Chaudhary, Durgesh
    Avula, Venkatesh
    Mudiganti, Satish
    Husby, Hannah
    Shahjouei, Shima
    Afshar, Ardavan
    Stewart, Walter F.
    Yeasin, Mohammed
    Zand, Ramin
    Abedi, Vida
    NPJ DIGITAL MEDICINE, 2021, 4 (01)
  • [19] On the Imputation of Missing Values in Univariate PM10 Time Series
    Albano, G.
    La Rocca, M.
    Perna, C.
    COMPUTER AIDED SYSTEMS THEORY - EUROCAST 2017, PT II, 2018, 10672 : 12 - 19
  • [20] SANNI: Online Imputation of Missing Values in Multivariate Time Series Based on Deep Learning and Behavioral Patterns
    Yurtin, A. A.
    Zymbler, M. L.
    LOBACHEVSKII JOURNAL OF MATHEMATICS, 2024, 45 (11) : 5948 - 5966