Imputing missing values in unevenly spaced clinical time series data to build an effective temporal classification framework

被引:15
|
作者
Nancy, Jane Y. [1 ]
Khanna, Nehemiah H. [1 ]
Arputharaj, Kannan [2 ]
机构
[1] Anna Univ, Ramanujan Comp Ctr, Madras 600025, Tamil Nadu, India
[2] Anna Univ, Dept Informat Sci & Technol, Madras 600025, Tamil Nadu, India
关键词
Time series; Missing value; Tolerance rough set; Particle swarm optimization; Inverse distance weight; HOT DECK; SPATIAL INTERPOLATION; MULTIPLE IMPUTATION;
D O I
10.1016/j.csda.2017.02.012
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
BACKGROUND: In healthcare domain, clinical trials generate time-stamped data that record set of observations on patient health status. These data are liable to missing values since there are situations, where the patient observations are neither done regularly nor updated correctly. OBJECTIVE: This paper aims to impute missing values in an unevenly spaced clinical time series data by proposing a tolerance rough set induced bio-statistical (TRiBS) framework. The proposed framework adopts an inverse distance weight (IDW) interpolation technique and improves it using the concept of tolerance rough set (TR) and particle swarm optimization (PSO). METHOD: To interpolate an unknown data point, the classical IDW interpolation suffers from two major drawbacks: first, in selecting the known data points and second, choosing an optimal influence factor. TRiBS framework overcomes the first limitation using TR and the second using PSO. TR derives the dependent attributes for each attribute using non- missing records. The nearest significant set is then generated for each missing value based on its attribute dependencies. The PSO technique fixes the weights for the data in a nearest significant set by finding an optimized influence factor. The obtained significant set and its influence factor are used in IDW computations to impute missing value. RESULT: The proposed work is experimented using clinical time series dataset of hepatitis and thrombosis patients. However, the proposed system can support other clinical time series dataset with minor domain specific changes. CONCLUSION: The performance of the imputed results proves the effectiveness of TRiBS. Experimental evaluation with the classifiers such as neural networks, support vector machine (SVM) and decision tree have shown an improvement in the classification accuracy when a missing data is pre-processed with the proposed framework. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:63 / 79
页数:17
相关论文
共 50 条
  • [11] Temporal classification of short time series data
    Venn, Benedikt
    Leifeld, Thomas
    Zhang, Ping
    Muehlhaus, Timo
    BMC BIOINFORMATICS, 2024, 25 (01)
  • [12] Temporal classification of short time series data
    Benedikt Venn
    Thomas Leifeld
    Ping Zhang
    Timo Mühlhaus
    BMC Bioinformatics, 25
  • [13] iVAR: A program for imputing missing data in multivariate time series using vector autoregressive models
    Liu, Siwei
    Molenaar, Peter C. M.
    BEHAVIOR RESEARCH METHODS, 2014, 46 (04) : 1138 - 1148
  • [14] iVAR: A program for imputing missing data in multivariate time series using vector autoregressive models
    Siwei Liu
    Peter C. M. Molenaar
    Behavior Research Methods, 2014, 46 : 1138 - 1148
  • [15] Missing values imputation in ocean buoy time series data
    Chakraborty, Samarpan
    Ide, Kayo
    Balachandran, Balakumar
    OCEAN ENGINEERING, 2025, 318
  • [16] Imputing Monthly Values for Quarterly Time Series: An Application Performed with Swiss Business Cycle Data
    Abberger K.
    Graff M.
    Müller O.
    Siliverstovs B.
    Journal of Business Cycle Research, 2023, 19 (3) : 241 - 273
  • [17] Improving traffic time-series predictability by imputing continuous non-random missing data
    Miao, Meng
    Kang, Mingyu
    Qian, Xusheng
    Chen, Duxin
    Wu, Weijiang
    Yu, Wenwu
    IET INTELLIGENT TRANSPORT SYSTEMS, 2023, 17 (10) : 1925 - 1934
  • [18] Effective Temporal Dependence Discovery in Time Series Data
    Cai, Qingchao
    Xie, Zhongle
    Zhang, Meihui
    Chen, Gang
    Jagadish, H., V
    Ooi, Beng Chin
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (08): : 893 - 905
  • [19] Filling Missing Values on Wearable-Sensory Time Series Data
    Lin, Suwen
    Wu, Xian
    Martinez, Gonzalo
    Chawla, Nitesh, V
    PROCEEDINGS OF THE 2020 SIAM INTERNATIONAL CONFERENCE ON DATA MINING (SDM), 2020, : 46 - 54
  • [20] A Review of Missing Values Handling Methods on Time-Series Data
    Pratama, Irfan
    Permanasari, Adhistya Erna
    Ardiyanto, Igi
    Indrayani, Rini
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY SYSTEMS AND INNOVATION (ICITSI), 2016,