Deep imputation of missing values in time series health data: A review with benchmarking

被引：14

作者：

Kazijevs, Maksims ^{[1
]}

Samad, Manar D. ^{[1
]}

机构：

[1] Tennessee State Univ, Dept Comp Sci, Nashville, TN 37209 USA

来源：

JOURNAL OF BIOMEDICAL INFORMATICS | 2023年 / 144卷

基金：

美国国家卫生研究院;

关键词：

Time series; Multivariate data; Longitudinal imputation; Cross-sectional imputation; Missing value imputation; Deep neural network; Electronic health records; Sensor data;

D O I：

10.1016/j.jbi.2023.104440

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

The imputation of missing values in multivariate time series (MTS) data is critical in ensuring data quality and producing reliable data-driven predictive models. Apart from many statistical approaches, a few recent studies have proposed state-of-the-art deep learning methods to impute missing values in MTS data. However, the evaluation of these deep methods is limited to one or two data sets, low missing rates, and completely random missing value types. This survey performs six data-centric experiments to benchmark state-of-the-art deep imputation methods on five time series health data sets. Our extensive analysis reveals that no single imputation method outperforms the others on all five data sets. The imputation performance depends on data types, individual variable statistics, missing value rates, and types. Deep learning methods that jointly perform cross-sectional (across variables) and longitudinal (across time) imputations of missing values in time series data yield statistically better data quality than traditional imputation methods. Although computationally expensive, deep learning methods are practical given the current availability of high-performance computing resources, especially when data quality and sample size are of paramount importance in healthcare informatics. Our findings highlight the importance of data-centric selection of imputation methods to optimize data-driven predictive models.

引用

页数：18

共 78 条

[1] Bagnall A, 2018, Arxiv, DOI arXiv:1811.00075
[2] Batista G. E., 2002, His, V87, P251
[3] Bauer J., 2017, ARTICLE SSRN ELECT J, VXXI, P1, DOI DOI 10.2139/SSRN.2996611
[4] Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis
Beaulieu-Jones, Brett K.
Lavage, Daniel R.
Snyder, John W.
Moore, Jason H.
Pendergrass, Sarah A.
Bauer, Christopher R.
[J]. JMIR MEDICAL INFORMATICS, 2018, 6 (01)
[5] Beaulieu-Jones BK, 2017, BIOCOMPUT-PAC SYM, P207, DOI 10.1142/9789813207813_0021
[6] "Deep" Learning for Missing Value Imputation in Tables with Non-Numerical Data
Biessmann, Felix
Salinas, David
Schelter, Sebastian
Schmidt, Philipp
Lange, Dustin
[J]. CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 2017 - 2025
[7] Cao W, 2018, ADV NEUR IN, V31
[8] Charlton Peter H, 2018, IEEE Rev Biomed Eng, V11, P2, DOI 10.1109/RBME.2017.2763681
[9] Recurrent Neural Networks for Multivariate Time Series with Missing Values
Che, Zhengping
Purushotham, Sanjay
Cho, Kyunghyun
Sontag, David
Liu, Yan
[J]. SCIENTIFIC REPORTS, 2018, 8
[10] Imputation of missing data with neural networks for classification
Choudhury, Suyra Jyoti
Pal, Nikhil R.
[J]. KNOWLEDGE-BASED SYSTEMS, 2019, 182

← 1 2 3 4 5 6 7 8 →