Handling incomplete heterogeneous data using VAEs

被引:147
作者
Nazabal, Alfredo [1 ]
Olmos, Pablo M. [2 ]
Ghahramani, Zoubin [3 ,4 ]
Valera, Isabel [5 ,6 ]
机构
[1] Alan Turing Inst, London, England
[2] Univ Carlos III, Madrid, Spain
[3] Univ Cambridge, Cambridge, England
[4] Uber AI Labs, San Francisco, CA USA
[5] Max Planck Inst Intelligent Syst, Tubingen, Germany
[6] Saarland Univ, Dept Comp Sci, Saarbrucken, Germany
基金
英国工程与自然科学研究理事会; 欧洲研究理事会;
关键词
Generative models; Variational autoencoders; Incomplete heterogenous data; IMPUTATION; MODELS;
D O I
10.1016/j.patcog.2020.107501
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Variational autoencoders (VAEs), as well as other generative models, have been shown to be efficient and accurate for capturing the latent structure of vast amounts of complex high-dimensional data. However, existing VAEs can still not directly handle data that are heterogenous (mixed continuous and discrete) or incomplete (with missing data at random), which is indeed common in real-world applications. In this paper, we propose a general framework to design VAEs suitable for fitting incomplete heterogenous data. The proposed HI-VAE includes likelihood models for real-valued, positive real valued, interval, categorical, ordinal and count data, and allows accurate estimation (and potentially imputation) of missing data. Furthermore, HI-VAE presents competitive predictive performance in supervised tasks, outperforming supervised models when trained on incomplete data. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:11
相关论文
共 45 条
  • [1] Ainsworth SK., 2018, P 35 INT C MACHINE L, V80, P119
  • [2] Generative adversarial framework for depth filling via Wasserstein metric, cosine transform and domain transfer
    Atapour-Abarghouei, Amir
    Akcay, Samet
    de La Garanderie, Gregoire Payen
    Breckon, Toby P.
    [J]. PATTERN RECOGNITION, 2019, 91 : 232 - 244
  • [3] Multiple imputation by chained equations: what is it and how does it work?
    Azur, Melissa J.
    Stuart, Elizabeth A.
    Frangakis, Constantine
    Leaf, Philip J.
    [J]. INTERNATIONAL JOURNAL OF METHODS IN PSYCHIATRIC RESEARCH, 2011, 20 (01) : 40 - 49
  • [4] Burda Yuri, 2016, 4 INT C LEARN REPR
  • [5] Recent Advances of Generative Adversarial Networks in Computer Vision
    Cao, Yang-Jie
    Jia, Li-Li
    Chen, Yong-Xia
    Lin, Nan
    Yang, Cong
    Zhang, Bo
    Liu, Zhi
    Li, Xue-Xiang
    Dai, Hong-Hua
    [J]. IEEE ACCESS, 2019, 7 : 14985 - 15006
  • [6] Learning Aligned Cross-Modal Representations from Weakly Aligned Data
    Castrejon, Lluis
    Aytar, Yusuf
    Vondrick, Carl
    Pirsiavash, Hamed
    Torralba, Antonio
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2940 - 2949
  • [7] Gated-GAN: Adversarial Gated Networks for Multi-Collection Style Transfer
    Chen, Xinyuan
    Xu, Chang
    Yang, Xiaokang
    Song, Li
    Tao, Dacheng
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (02) : 546 - 560
  • [8] Impact of imputation of missing values on classification error for discrete data
    Farhangfar, Alireza
    Kurgan, Lukasz
    Dy, Jennifer
    [J]. PATTERN RECOGNITION, 2008, 41 (12) : 3692 - 3705
  • [9] Freitag M, 2018, J MACH LEARN RES, V18
  • [10] Ganin Y, 2016, J MACH LEARN RES, V17