Gaussian processes for missing value imputation

被引:10
|
作者
Jafrasteh, Bahram [1 ]
Hernandez-Lobato, Daniel [2 ]
Lubian-Lopez, Simon Pedro
Benavente-Fernandez, Isabel [1 ,3 ,4 ]
机构
[1] Puerta Mar Univ, Biomed Res & Innovat Inst, Cadiz INiB Res Unit, Cadiz, Spain
[2] Univ Autonoma Madrid, Comp Sci Dept, Madrid, Spain
[3] Puerta Mar Univ Hosp, Dept Pediat, Div Neonatol, Cadiz, Spain
[4] Univ Cddiz, Med Sch, Dept Child & Mother Hlth & Radiol, Area Pediat, Cadiz, Spain
关键词
Missing values; Gaussian process; Deep learning; Deep Gaussian processes; Variational inference;
D O I
10.1016/j.knosys.2023.110603
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A missing value indicates that a particular attribute of an instance of a learning problem is not recorded. They are very common in many real-life datasets. In spite of this, however, most machine learning methods cannot handle missing values. Thus, they should be imputed before training. Gaussian Processes (GPs) are non-parametric models with accurate uncertainty estimates that combined with sparse approximations and stochastic variational inference scale to large data sets. Sparse GPs (SGPs) can be used to get a predictive distribution for missing values. We present a hierarchical composition of sparse GPs that is used to predict the missing values at each dimension using the observed values from the other dimensions. Importantly, we consider that the input attributes to each sparse GP used for prediction may also have missing values. The missing values in those input attributes are replaced by the predictions of the previous sparse GPs in the hierarchy. We call our approach missing GP (MGP). MGP can impute all observed missing values. It outputs a predictive distribution for each missing value that is then used in the imputation of other missing values. We evaluate MGP on one private clinical data set and on four UCI datasets with a different percentage of missing values. Furthermore, we compare the performance of MGP with other state-of-the-art methods for imputing missing values, including variants based on sparse GPs and deep GPs. Our results show that the performance of MGP is significantly better. (c) 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Missing value imputation and the effect of feature normalisation on financial distress prediction
    Sue, Kuen-Liang
    Tsai, Chih-Fong
    Tsau, Hau-Min
    JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2024, 36 (08) : 1467 - 1483
  • [22] Missing value imputation in DNA microarrays based on conjugate gradient method
    Dorri, Fatemeh
    Azmi, Paeiz
    Dorri, Faezeh
    COMPUTERS IN BIOLOGY AND MEDICINE, 2012, 42 (02) : 222 - 227
  • [23] Missing value imputation using least squares techniques in contaminated matrices
    Garcia-Pena, Marisol
    Arciniegas-Alarcon, Sergio
    Krzanowski, Wojtek J.
    METHODSX, 2022, 9
  • [24] Weighted local least squares imputation method for missing value estimation
    Ching, Wai-Ki
    Cheng, Kwai-Wa
    Li, Li-Min
    Tsing, Nam-Kiu
    Wong, Alice S.
    OPTIMIZATION AND SYSTEMS BIOLOGY, 2007, 7 : 280 - +
  • [25] Missing value imputation for gene expression data by tailored nearest neighbors
    Faisal, Shahla
    Tutz, Gerhard
    STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2017, 16 (02) : 95 - 106
  • [26] Denoising Autoencoder-Based Missing Value Imputation for Smart Meters
    Ryu, Seunghyoung
    Kim, Minsoo
    Kim, Hongseok
    IEEE ACCESS, 2020, 8 : 40656 - 40666
  • [27] rMisbeta: A robust missing value imputation approach in transcriptomics and metabolomics data
    Shahjaman, Md
    Rahman, Md Rezanur
    Islam, Tania
    Auwul, Md Rabiul
    Moni, Mohammad Ali
    Mollah, Md Nurul Haque
    COMPUTERS IN BIOLOGY AND MEDICINE, 2021, 138 (138)
  • [28] Imputation Methods in Time Series with a Trend and a Consecutive Missing Value Pattern
    Wongoutong, Chantha
    THAILAND STATISTICIAN, 2021, 19 (04): : 866 - 879
  • [29] Missing value imputation in time series using Singular Spectrum Analysis
    Mahmoudvand, Rahim
    Rodrigues, Paulo Canas
    INTERNATIONAL JOURNAL OF ENERGY AND STATISTICS, 2016, 4 (01)
  • [30] Data discretization impact on deep learning for missing value imputation of continuous data
    Kumar, Talluri Sunil
    Neelakantan, P.
    Reddy, G. Suresh
    Gangappa, Malige
    Rajasekar, M.
    INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2025, 23 (01)