Imputing missing values in unevenly spaced clinical time series data to build an effective temporal classification framework

被引:15
|
作者
Nancy, Jane Y. [1 ]
Khanna, Nehemiah H. [1 ]
Arputharaj, Kannan [2 ]
机构
[1] Anna Univ, Ramanujan Comp Ctr, Madras 600025, Tamil Nadu, India
[2] Anna Univ, Dept Informat Sci & Technol, Madras 600025, Tamil Nadu, India
关键词
Time series; Missing value; Tolerance rough set; Particle swarm optimization; Inverse distance weight; HOT DECK; SPATIAL INTERPOLATION; MULTIPLE IMPUTATION;
D O I
10.1016/j.csda.2017.02.012
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
BACKGROUND: In healthcare domain, clinical trials generate time-stamped data that record set of observations on patient health status. These data are liable to missing values since there are situations, where the patient observations are neither done regularly nor updated correctly. OBJECTIVE: This paper aims to impute missing values in an unevenly spaced clinical time series data by proposing a tolerance rough set induced bio-statistical (TRiBS) framework. The proposed framework adopts an inverse distance weight (IDW) interpolation technique and improves it using the concept of tolerance rough set (TR) and particle swarm optimization (PSO). METHOD: To interpolate an unknown data point, the classical IDW interpolation suffers from two major drawbacks: first, in selecting the known data points and second, choosing an optimal influence factor. TRiBS framework overcomes the first limitation using TR and the second using PSO. TR derives the dependent attributes for each attribute using non- missing records. The nearest significant set is then generated for each missing value based on its attribute dependencies. The PSO technique fixes the weights for the data in a nearest significant set by finding an optimized influence factor. The obtained significant set and its influence factor are used in IDW computations to impute missing value. RESULT: The proposed work is experimented using clinical time series dataset of hepatitis and thrombosis patients. However, the proposed system can support other clinical time series dataset with minor domain specific changes. CONCLUSION: The performance of the imputed results proves the effectiveness of TRiBS. Experimental evaluation with the classifiers such as neural networks, support vector machine (SVM) and decision tree have shown an improvement in the classification accuracy when a missing data is pre-processed with the proposed framework. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:63 / 79
页数:17
相关论文
共 50 条
  • [21] Estimating Missing Values in Multivariate-Time-Series Clinical Data using Gradient Boosting Tree on Temporal and Cross-Variable Features
    Xu, Xiao
    Wang, Junmei
    Xu, Xian
    Sun, Yuyao
    Chen, Quanhe
    Li, Xiang
    Xie, Guotong
    2019 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI), 2019, : 541 - 543
  • [22] Imputing missing data in non-renewable empower time series from night-time lights observations
    Neri, Laura
    Coscieme, Luca
    Giannetti, Biagio F.
    Pulselli, Federico M.
    ECOLOGICAL INDICATORS, 2018, 84 : 106 - 118
  • [23] Classifying unevenly spaced clinical time series data using forecast error approximation based bottom-up (FeAB) segmented time delay neural network
    Jane, Y. Nancy
    Nehemiah, H. Khanna
    Kannan, Arputharaj
    COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING-IMAGING AND VISUALIZATION, 2021, 9 (01): : 92 - 105
  • [24] Missing data imputation of high-resolution temporal climate time series data
    Afrifa-Yamoah, E.
    Mueller, U. A.
    Taylor, S. M.
    Fisher, A. J.
    METEOROLOGICAL APPLICATIONS, 2020, 27 (01)
  • [25] Cyclic Gate Recurrent Neural Networks for Time Series Data with Missing Values
    Philip B. Weerakody
    Kok Wai Wong
    Guanjin Wang
    Neural Processing Letters, 2023, 55 : 1527 - 1554
  • [26] Deep imputation of missing values in time series health data: A review with benchmarking
    Kazijevs, Maksims
    Samad, Manar D.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2023, 144
  • [27] Cyclic Gate Recurrent Neural Networks for Time Series Data with Missing Values
    Weerakody, Philip B.
    Wong, Kok Wai
    Wang, Guanjin
    NEURAL PROCESSING LETTERS, 2023, 55 (02) : 1527 - 1554
  • [28] A novel imputation method for missing values in air pollutant time series data
    Pena, Mario
    Ortega, Patricia
    Orellana, Marcos
    2019 IEEE LATIN AMERICAN CONFERENCE ON COMPUTATIONAL INTELLIGENCE (LA-CCI), 2019, : 99 - 104
  • [29] STUDIES IN ASTRONOMICAL TIME-SERIES ANALYSIS .2. STATISTICAL ASPECTS OF SPECTRAL-ANALYSIS OF UNEVENLY SPACED DATA
    SCARGLE, JD
    ASTROPHYSICAL JOURNAL, 1982, 263 (02): : 835 - 853
  • [30] IMPUTATION FOR CONSECUTIVE MISSING VALUES IN NON-STATIONARY TIME SERIES DATA
    Wongoutong, Chantha
    ADVANCES AND APPLICATIONS IN STATISTICS, 2020, 64 (01) : 87 - 102