Binned Data Provide Better Imputation of Missing Time Series Data from Wearables

被引:6
|
作者
Chakrabarti, Shweta [1 ]
Biswas, Nupur [1 ]
Karnani, Khushi [2 ]
Padul, Vijay [1 ]
Jones, Lawrence D. [3 ]
Kesari, Santosh [4 ]
Ashili, Shashaanka [3 ]
机构
[1] Rhenix Lifesciences, Hyderabad 500038, India
[2] Indian Inst Technol, Dept Biosci & Bioengn, Gauhati 781039, India
[3] CureScience, 5820 Oberlin Dr, 202, San Diego, CA 92121 USA
[4] Pacific Neurosci Inst, St Johns Canc Inst Providence St Johns Hlth Ctr, Dept Translat Neurosci, Santa Monica, CA 90404 USA
关键词
imputation; missing data; time-series data; binning; wearables; MULTIPLE IMPUTATION;
D O I
10.3390/s23031454
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
The presence of missing values in a time-series dataset is a very common and well-known problem. Various statistical and machine learning methods have been developed to overcome this problem, with the aim of filling in the missing values in the data. However, the performances of these methods vary widely, showing a high dependence on the type of data and correlations within the data. In our study, we performed some of the well-known imputation methods, such as expectation maximization, k-nearest neighbor, iterative imputer, random forest, and simple imputer, to impute missing data obtained from smart, wearable health trackers. In this manuscript, we proposed the use of data binning for imputation. We showed that the use of data binned around the missing time interval provides a better imputation than the use of a whole dataset. Imputation was performed for 15 min and 1 h of continuous missing data. We used a dataset with different bin sizes, such as 15 min, 30 min, 45 min, and 1 h, and we carried out evaluations using root mean square error (RMSE) values. We observed that the expectation maximization algorithm worked best for the use of binned data. This was followed by the simple imputer, iterative imputer, and k-nearest neighbor, whereas the random forest method had no effect on data binning during imputation. Moreover, the smallest bin sizes of 15 min and 1 h were observed to provide the lowest RMSE values for the majority of the time frames during the imputation of 15 min and 1 h of missing data, respectively. Although applicable to digital health data, we think that this method will also find applicability in other domains.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Method of missing data imputation for multivariate time series
    Li Z.
    Zhang F.
    Wang Y.
    Tao Q.
    Li C.
    2018, Chinese Institute of Electronics (40): : 225 - 230
  • [2] Missing Data Imputation in Time Series of Air Pollution
    Junger, Washington
    de Leon, Antonio Ponce
    EPIDEMIOLOGY, 2009, 20 (06) : S87 - S87
  • [3] Imputation of missing data in time series for air pollutants
    Junger, W. L.
    de Leon, A. Ponce
    ATMOSPHERIC ENVIRONMENT, 2015, 102 : 96 - 104
  • [4] Time Series Data and Recent Imputation Techniques for Missing Data: A Review
    Zainuddin, Aznilinda
    Hairuddin, Muhammad Asraf
    Yassin, Ahmad Ihsan Mohd
    Abd Latiff, Zatul Iffah
    Azhar, Aziemah
    2022 INTERNATIONAL CONFERENCE ON GREEN ENERGY, COMPUTING AND SUSTAINABLE TECHNOLOGY (GECOST), 2022, : 346 - 350
  • [5] Missing Data Imputation in Time Series by Evolutionary Algorithms
    Figueroa Garcia, Juan C.
    Kalenatic, Dusko
    Lopez Bello, Cesar Amilcar
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, PROCEEDINGS: WITH ASPECTS OF ARTIFICIAL INTELLIGENCE, 2008, 5227 : 275 - +
  • [6] Data Imputation for Multivariate Time Series Sensor Data With Large Gaps of Missing Data
    Wu, Rui
    Hamshaw, Scott D.
    Yang, Lei
    Kincaid, Dustin W.
    Etheridge, Randall
    Ghasemkhani, Amir
    IEEE SENSORS JOURNAL, 2022, 22 (11) : 10671 - 10683
  • [7] Combining Convolution and Transformer for Missing Time Series Data Imputation
    Wang, Yi-Fan
    Bu, Shuai-Yu
    Yan, Jing-Hua
    Hou, Zhi-Wen
    Bu, Ling-Bin
    Meng, Fan-Xu
    Journal of Network Intelligence, 2023, 8 (03): : 823 - 838
  • [8] Missing values imputation in ocean buoy time series data
    Chakraborty, Samarpan
    Ide, Kayo
    Balachandran, Balakumar
    OCEAN ENGINEERING, 2025, 318
  • [9] Comparison of Missing Data Imputation Methods in Time Series Forecasting
    Ahn, Hyun
    Sun, Kyunghee
    Kim, Kwanghoon Pio
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (01): : 767 - 779
  • [10] Missing data imputation of high-resolution temporal climate time series data
    Afrifa-Yamoah, E.
    Mueller, U. A.
    Taylor, S. M.
    Fisher, A. J.
    METEOROLOGICAL APPLICATIONS, 2020, 27 (01)