Gaussian processes for missing value imputation

被引:11
作者
Jafrasteh, Bahram [1 ]
Hernandez-Lobato, Daniel [2 ]
Lubian-Lopez, Simon Pedro
Benavente-Fernandez, Isabel [1 ,3 ,4 ]
机构
[1] Puerta Mar Univ, Biomed Res & Innovat Inst, Cadiz INiB Res Unit, Cadiz, Spain
[2] Univ Autonoma Madrid, Comp Sci Dept, Madrid, Spain
[3] Puerta Mar Univ Hosp, Dept Pediat, Div Neonatol, Cadiz, Spain
[4] Univ Cddiz, Med Sch, Dept Child & Mother Hlth & Radiol, Area Pediat, Cadiz, Spain
关键词
Missing values; Gaussian process; Deep learning; Deep Gaussian processes; Variational inference;
D O I
10.1016/j.knosys.2023.110603
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A missing value indicates that a particular attribute of an instance of a learning problem is not recorded. They are very common in many real-life datasets. In spite of this, however, most machine learning methods cannot handle missing values. Thus, they should be imputed before training. Gaussian Processes (GPs) are non-parametric models with accurate uncertainty estimates that combined with sparse approximations and stochastic variational inference scale to large data sets. Sparse GPs (SGPs) can be used to get a predictive distribution for missing values. We present a hierarchical composition of sparse GPs that is used to predict the missing values at each dimension using the observed values from the other dimensions. Importantly, we consider that the input attributes to each sparse GP used for prediction may also have missing values. The missing values in those input attributes are replaced by the predictions of the previous sparse GPs in the hierarchy. We call our approach missing GP (MGP). MGP can impute all observed missing values. It outputs a predictive distribution for each missing value that is then used in the imputation of other missing values. We evaluate MGP on one private clinical data set and on four UCI datasets with a different percentage of missing values. Furthermore, we compare the performance of MGP with other state-of-the-art methods for imputing missing values, including variants based on sparse GPs and deep GPs. Our results show that the performance of MGP is significantly better. (c) 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Missing value imputation method for disaster decision-making using K nearest neighbor
    Ma, Xiaofei
    Zhong, Qiuyan
    JOURNAL OF APPLIED STATISTICS, 2016, 43 (04) : 767 - 781
  • [42] An Experimental Survey of Missing Data Imputation Algorithms
    Miao, Xiaoye
    Wu, Yangyang
    Chen, Lu
    Gao, Yunjun
    Yin, Jianwei
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (07) : 6630 - 6650
  • [43] Missing Values Imputation Hypothesis: An Experimental Evaluation
    Li, Huaxiong
    Zhou, Xianzhong
    Yao, Yiyu
    PROCEEDINGS OF THE 8TH IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS, 2009, : 275 - +
  • [44] Four Factors Affecting Missing Data Imputation
    Hackl, Andreas
    Zeindl, Juergen
    Ehrlinger, Lisa
    35TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, SSDBM 2023, 2023,
  • [45] Missing data imputation in PLS-SEM
    Wang H.
    Lu S.
    Liu Y.
    Quality & Quantity, 2022, 56 (6) : 4777 - 4795
  • [46] Advances in Biomedical Missing Data Imputation: A Survey
    Barrabes, Miriam
    Perera, Maria
    Novelle Moriano, Victor
    Giro-I-Nieto, Xavier
    Mas Montserrat, Daniel
    Ioannidis, Alexander G.
    IEEE ACCESS, 2025, 13 : 16918 - 16932
  • [47] A Comprehensive Survey on Traffic Missing Data Imputation
    Zhang, Yimei
    Kong, Xiangjie
    Zhou, Wenfeng
    Liu, Jin
    Fu, Yanjie
    Shen, Guojiang
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (12) : 19252 - 19275
  • [48] A novel framework for imputation of missing values in databases
    Farhangfar, Alireza
    Kurgan, Lukasz A.
    Pedrycz, Witold
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2007, 37 (05): : 692 - 709
  • [49] Generative adversarial learning for missing data imputation
    Xinyang Wang
    Hongyu Chen
    Jiayu Zhang
    Jicong Fan
    Neural Computing and Applications, 2025, 37 (3) : 1403 - 1416
  • [50] A hybridization of multiple imputation and one-class bagging ensemble approach for missing value and class imbalance problem
    Baro, Pranita
    Borah, Malaya Dutta
    EVOLVING SYSTEMS, 2024, 15 (06) : 2021 - 2066