Imputing missing values in unevenly spaced clinical time series data to build an effective temporal classification framework

被引:15
|
作者
Nancy, Jane Y. [1 ]
Khanna, Nehemiah H. [1 ]
Arputharaj, Kannan [2 ]
机构
[1] Anna Univ, Ramanujan Comp Ctr, Madras 600025, Tamil Nadu, India
[2] Anna Univ, Dept Informat Sci & Technol, Madras 600025, Tamil Nadu, India
关键词
Time series; Missing value; Tolerance rough set; Particle swarm optimization; Inverse distance weight; HOT DECK; SPATIAL INTERPOLATION; MULTIPLE IMPUTATION;
D O I
10.1016/j.csda.2017.02.012
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
BACKGROUND: In healthcare domain, clinical trials generate time-stamped data that record set of observations on patient health status. These data are liable to missing values since there are situations, where the patient observations are neither done regularly nor updated correctly. OBJECTIVE: This paper aims to impute missing values in an unevenly spaced clinical time series data by proposing a tolerance rough set induced bio-statistical (TRiBS) framework. The proposed framework adopts an inverse distance weight (IDW) interpolation technique and improves it using the concept of tolerance rough set (TR) and particle swarm optimization (PSO). METHOD: To interpolate an unknown data point, the classical IDW interpolation suffers from two major drawbacks: first, in selecting the known data points and second, choosing an optimal influence factor. TRiBS framework overcomes the first limitation using TR and the second using PSO. TR derives the dependent attributes for each attribute using non- missing records. The nearest significant set is then generated for each missing value based on its attribute dependencies. The PSO technique fixes the weights for the data in a nearest significant set by finding an optimized influence factor. The obtained significant set and its influence factor are used in IDW computations to impute missing value. RESULT: The proposed work is experimented using clinical time series dataset of hepatitis and thrombosis patients. However, the proposed system can support other clinical time series dataset with minor domain specific changes. CONCLUSION: The performance of the imputed results proves the effectiveness of TRiBS. Experimental evaluation with the classifiers such as neural networks, support vector machine (SVM) and decision tree have shown an improvement in the classification accuracy when a missing data is pre-processed with the proposed framework. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:63 / 79
页数:17
相关论文
共 50 条
  • [1] Noise analysis of unevenly spaced time series data
    Hackman, C
    Parker, TE
    METROLOGIA, 1996, 33 (05) : 457 - 466
  • [2] Imputing Missing Values Using Inverse Distance Weighted Interpolation for Time Series Data
    Dhevi, A. T. Sree
    2014 SIXTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING, 2014, : 255 - 259
  • [3] GMA: Gap Imputing Algorithm for time series missing values
    Abd Alhamid Rabia Khattab
    Nada Mohamed Elshennawy
    Mahmoud Fahmy
    Journal of Electrical Systems and Information Technology, 10 (1)
  • [4] A novel framework for imputing large gaps of missing values from time series sensor data of marine machinery systems
    Velasco-Gallego, Christian
    Lazakis, Iraklis
    SHIPS AND OFFSHORE STRUCTURES, 2022, 17 (08) : 1802 - 1811
  • [5] AN EVOLUTIONARY APPROACH FOR IMPUTING MISSING DATA IN TIME SERIES
    Figueroa Garcia, Juan Carlos
    Kalenatic, Dusko
    Lopez Bello, Cesar Amilcar
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2010, 19 (01) : 107 - 121
  • [6] DPCF: A framework for imputing missing values and clustering data in drug discovery process
    Bhagat, Hutashan Vishal
    Singh, Manminder
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2022, 231
  • [7] Representing unevenly-spaced time series data for visualization and interactive exploration
    Aris, A
    Shneiderman, B
    Plaisant, C
    Shmueli, G
    Jank, W
    HUMAN-COMPUTER INTERACTION - INTERACT 2005, PROCEEDINGS, 2005, 3585 : 835 - 846
  • [8] An Observed Value Consistent Diffusion Model for Imputing Missing Values in Multivariate Time Series
    Wang, Xu
    Zhang, Hongbo
    Wang, Pengkun
    Zhang, Yudong
    Wang, Binwu
    Zhou, Zhengyang
    Wang, Yang
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 2409 - 2418
  • [9] Application of structured low-rank approximation methods for imputing missing values in time series
    Gillard, Jonathan
    Zhigljavsky, Anatoly
    STATISTICS AND ITS INTERFACE, 2015, 8 (03) : 321 - 330
  • [10] Simple nuclear norm based algorithms for imputing missing data and forecasting in time series
    Butcher, Holly
    Gillard, Jonathan
    STATISTICS AND ITS INTERFACE, 2017, 10 (01) : 19 - 25