A Multi-directional Approach for Missing Value Estimation in Multivariate Time Series Clinical Data

被引:6
|
作者
Xu, Xiao [1 ]
Liu, Xiaoshuang [1 ]
Kang, Yanni [1 ]
Xu, Xian [1 ]
Wang, Junmei [1 ]
Sun, Yuyao [1 ]
Chen, Quanhe [1 ]
Jia, Xiaoyu [1 ]
Ma, Xinyue [1 ]
Meng, Xiaoyan [1 ]
Li, Xiang [1 ]
Xie, Guotong [1 ]
机构
[1] Ping Hlth Technol, Beijing, Peoples R China
关键词
Multi-directional; Missing Value Estimation; Multivariate time series; Feature engineering; Gradient boosting tree; IMPUTATION;
D O I
10.1007/s41666-020-00076-2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Missing values are common in clinical datasets which bring obstacles for clinical data analysis. Correctly estimating the missing parts plays a critical role in utilizing these analysis approaches. However, only limited works focus on the missing value estimation of multivariate time series (MTS) clinical data, which is one of the most challenge data types in this area. We attempt to develop a methodology (MD-MTS) with high accuracy for the missing value estimation in MTS clinical data. In MD-MTS, temporal and cross-variable information are constructed as multi-directional features for an efficient gradient boosting decision tree (LightGBM). For each patient, temporal information represents the sequential relations among the values of one variable in different time-stamps, and cross-variable information refers to the correlations among the values of different variables in a fixed time-stamp. We evaluated the estimation method performance based on the gap between the true values and the estimated values on the randomly masked parts. MD-MTS outperformed three baseline methods (3D-MICE, Amelia II and BRITS) on the ICHI challenge 2019 datasets that containing 13 time series variables. The root-mean-square error of MD-MTS, 3D-MICE, Amelia II and BRITS on offline-test dataset are 0.1717, 0.2247, 0.1900, and 0.1862, respectively. On online-test dataset, the performance for the former three methods is 0.1720, 0.2235, and 0.1927, respectively. Furthermore, MD-MTS got the first in ICHI challenge 2019 among dozens of competition models. MD-MTS provides an accurate and robust approach for estimating the missing values in MTS clinical data, which can be easily used as a preprocessing step for the downstream clinical data analysis.
引用
收藏
页码:365 / 382
页数:18
相关论文
共 50 条
  • [21] A Versatile Approach to Classification of Multivariate Time Series Data
    Zagorecki, Adam
    PROCEEDINGS OF THE 2015 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2015, 5 : 407 - 410
  • [22] Estimation Method Based on MinMaxEnt Distribution for Missing Value in Time Series
    Shamilov, Aladdin
    Giriftinoglu, Cigdem
    PROCEEDINGS OF THE 8TH WSEAS INTERNATIONAL CONFERENCE ON SYSTEM SCIENCE AND SIMULATION IN ENGINEERING (ICOSSSE '09), 2009, : 206 - 210
  • [23] Application of Two-Directional Time Series Models to Replace Missing Data
    Huo, Jinsheng
    Cox, Chris D.
    Seaver, William L.
    Robinson, R. Bruce
    Jiang, Yan
    JOURNAL OF ENVIRONMENTAL ENGINEERING, 2010, 136 (04) : 435 - 443
  • [24] A Multi-granularity Network for Time Series Forecasting on Multivariate Time Series Data
    Wang, Zongqiang
    Xian, Yan
    Wang, Guoyin
    Yu, Hong
    ROUGH SETS, IJCRS 2023, 2023, 14481 : 324 - 338
  • [25] Atmospheric correction algorithm with multi-directional POLDER data
    Mitomi, Y
    Fukushima, H
    Takamura, T
    OCEAN OPTICS: REMOTE SENSING AND UNDERWATER IMAGING, 2002, 4488 : 233 - 237
  • [26] Multivariate Time Series Missing Data Imputation Using Recurrent Denoising Autoencoder
    Zhang, Jianye
    Yin, Peng
    2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 760 - 764
  • [27] Infinite hidden Markov models for multiple multivariate time series with missing data
    Hoskovec, Lauren
    Koslovsky, Matthew D.
    Koehler, Kirsten
    Good, Nicholas
    Peel, Jennifer L.
    Volckens, John
    Wilson, Ander
    BIOMETRICS, 2023, 79 (03) : 2592 - 2604
  • [28] An Observed Value Consistent Diffusion Model for Imputing Missing Values in Multivariate Time Series
    Wang, Xu
    Zhang, Hongbo
    Wang, Pengkun
    Zhang, Yudong
    Wang, Binwu
    Zhou, Zhengyang
    Wang, Yang
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 2409 - 2418
  • [29] Resource flows in a multi-directional integrated value creation model
    Arabie, Hope
    Fox, Corey J.
    Rayburn, Steven W.
    JOURNAL OF GENERAL MANAGEMENT, 2023,
  • [30] A FAST BLOCK MOTION ESTIMATION ALGORITHM WITH MULTI-DIRECTIONAL ADAPTATION
    Duanmu, C. J.
    Chen, Xing
    2009 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS AND SIGNAL PROCESSING (WCSP 2009), 2009, : 1162 - 1165