Imputing missing values in unevenly spaced clinical time series data to build an effective temporal classification framework

被引:15
|
作者
Nancy, Jane Y. [1 ]
Khanna, Nehemiah H. [1 ]
Arputharaj, Kannan [2 ]
机构
[1] Anna Univ, Ramanujan Comp Ctr, Madras 600025, Tamil Nadu, India
[2] Anna Univ, Dept Informat Sci & Technol, Madras 600025, Tamil Nadu, India
关键词
Time series; Missing value; Tolerance rough set; Particle swarm optimization; Inverse distance weight; HOT DECK; SPATIAL INTERPOLATION; MULTIPLE IMPUTATION;
D O I
10.1016/j.csda.2017.02.012
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
BACKGROUND: In healthcare domain, clinical trials generate time-stamped data that record set of observations on patient health status. These data are liable to missing values since there are situations, where the patient observations are neither done regularly nor updated correctly. OBJECTIVE: This paper aims to impute missing values in an unevenly spaced clinical time series data by proposing a tolerance rough set induced bio-statistical (TRiBS) framework. The proposed framework adopts an inverse distance weight (IDW) interpolation technique and improves it using the concept of tolerance rough set (TR) and particle swarm optimization (PSO). METHOD: To interpolate an unknown data point, the classical IDW interpolation suffers from two major drawbacks: first, in selecting the known data points and second, choosing an optimal influence factor. TRiBS framework overcomes the first limitation using TR and the second using PSO. TR derives the dependent attributes for each attribute using non- missing records. The nearest significant set is then generated for each missing value based on its attribute dependencies. The PSO technique fixes the weights for the data in a nearest significant set by finding an optimized influence factor. The obtained significant set and its influence factor are used in IDW computations to impute missing value. RESULT: The proposed work is experimented using clinical time series dataset of hepatitis and thrombosis patients. However, the proposed system can support other clinical time series dataset with minor domain specific changes. CONCLUSION: The performance of the imputed results proves the effectiveness of TRiBS. Experimental evaluation with the classifiers such as neural networks, support vector machine (SVM) and decision tree have shown an improvement in the classification accuracy when a missing data is pre-processed with the proposed framework. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:63 / 79
页数:17
相关论文
共 50 条
  • [41] Combining attention with spectrum to handle missing values on time series data without imputation
    Chen, Yen -Pin
    Huang, Chien-Hua
    Lo, Yuan-Hsun
    Chen, Yi-Ying
    Lai, Feipei
    INFORMATION SCIENCES, 2022, 609 : 1271 - 1287
  • [42] Temporal 2D-cycle-generation framework for time series classification
    Chen, Xi
    Jin, Xiu
    Zhang, Hua
    Xiong, Jianghui
    Deng, Youhui
    Zhang, Xiaodan
    APPLIED SOFT COMPUTING, 2025, 171
  • [43] Spectral Temporal Information for Missing Data Reconstruction (STIMDR) of Landsat Reflectance Time Series
    Tang, Zhipeng
    Amatulli, Giuseppe
    Pellikka, Petri K. E.
    Heiskanen, Janne
    REMOTE SENSING, 2022, 14 (01)
  • [44] TIformer: A Transformer-Based Framework for Time-Series Forecasting with Missing Data
    Ding, Zuocheng
    Chen, Yufan
    Wang, Hanchen
    Wang, Xiaoyang
    Zhang, Wenjie
    Zhang, Ying
    DATABASES THEORY AND APPLICATIONS, ADC 2024, 2025, 15449 : 71 - 84
  • [45] EFFECTS OF MISSING DATA ON THE STATISTICAL-ANALYSIS OF CLINICAL TIME-SERIES
    RANKIN, ED
    MARSH, JC
    SOCIAL WORK RESEARCH & ABSTRACTS, 1985, 21 (02): : 13 - 16
  • [46] An Enhanced Machine Learning Framework for Type 2 Diabetes Classification Using Imbalanced Data with Missing Values
    Roy, Kumarmangal
    Ahmad, Muneer
    Waqar, Kinza
    Priyaah, Kirthanaah
    Nebhen, Jamel
    Alshamrani, Sultan S.
    Raza, Muhammad Ahsan
    Ali, Ihsan
    COMPLEXITY, 2021, 2021
  • [47] Comparison of Estimating Missing Values in IoT Time Series Data Using Different Interpolation Algorithms
    Zengyu Ding
    Gang Mei
    Salvatore Cuomo
    Yixuan Li
    Nengxiong Xu
    International Journal of Parallel Programming, 2020, 48 : 534 - 548
  • [48] Comparison of Estimating Missing Values in IoT Time Series Data Using Different Interpolation Algorithms
    Ding, Zengyu
    Mei, Gang
    Cuomo, Salvatore
    Li, Yixuan
    Xu, Nengxiong
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2020, 48 (03) : 534 - 548
  • [49] What to Do about Missing Values in Time-Series Cross-Section Data
    Honaker, James
    King, Gary
    AMERICAN JOURNAL OF POLITICAL SCIENCE, 2010, 54 (02) : 561 - 581
  • [50] Handling Missing Values in Interrupted Time Series Analysis of Longitudinal Individual-Level Data
    Bazo-Alvarez, Juan Carlos
    Morris, Tim P.
    Tra My Pham
    Carpenter, James R.
    Petersen, Irene
    CLINICAL EPIDEMIOLOGY, 2020, 12 : 1045 - 1057