Genomic data imputation with variational auto-encoders

被引:44
|
作者
Qiu, Yeping Lina [1 ,2 ]
Zheng, Hong [1 ]
Gevaert, Olivier [1 ,3 ]
机构
[1] Stanford Univ, Stanford Ctr Biomed Informat Res, Dept Med, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA
[3] Stanford Univ, Dept Biomed Data Sci, Stanford, CA 94305 USA
来源
GIGASCIENCE | 2020年 / 9卷 / 08期
基金
美国国家卫生研究院;
关键词
imputation; variational auto-encoder; deep learning; MISSING VALUE IMPUTATION; AUTOENCODERS; NETWORK;
D O I
10.1093/gigascience/giaa082
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: As missing values are frequently present in genomic data, practical methods to handle missing data are necessary for downstream analyses that require complete data sets. State-of-the-art imputation techniques, including methods based on singular value decomposition and K-nearest neighbors, can be computationally expensive for large data sets and it is difficult to modify these algorithms to handle certain cases not missing at random. Results: In this work, we use a deep-learning framework based on the variational auto-encoder (VAE) for genomic missing value imputation and demonstrate its effectiveness in transcriptome and methylome data analysis. We show that in the vast majority of our testing scenarios, VAE achieves similar or better performances than the most widely used imputation standards, while having a computational advantage at evaluation time. When dealing with data missing not at random (e.g., few values are missing), we develop simple yet effective methodologies to leverage the prior knowledge about missing data. Furthermore, we investigate the effect of varying latent space regularization strength in VAE on the imputation performances and, in this context, show why VAE has a better imputation capacity compared to a regular deterministic auto-encoder. Conclusions: We describe a deep learning imputation framework for transcriptome and methylome data using a VAE and show that it can be a preferable alternative to traditional methods for data imputation, especially in the setting of large-scale data and certain missing-not-at-random scenarios.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Correlated Variational Auto-Encoders
    Tang, Da
    Liang, Dawen
    Jebara, Tony
    Ruozzi, Nicholas
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [2] Hyperspherical Variational Auto-Encoders
    Davidson, Tim R.
    Falorsi, Luca
    De Cao, Nicola
    Kipf, Thomas
    Tomczak, Jakub M.
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2018, : 856 - 865
  • [3] Monte Carlo Variational Auto-Encoders
    Thin, Achille
    Kotelevskii, Nikita
    Durmus, Alain
    Panov, Maxim
    Moulines, Eric
    Doucet, Arnaud
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139 : 7258 - 7267
  • [4] Consistency Regularization for Variational Auto-Encoders
    Sinha, Samarth
    Dieng, Adji B.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [5] Learning Generative Factors of EEG Data with Variational Auto-Encoders
    Zhdanov, Maksim
    Steinmann, Saskia
    Hoffmann, Nico
    DEEP GENERATIVE MODELS, DGM4MICCAI 2022, 2022, 13609 : 45 - 54
  • [6] Radon-Sobolev Variational Auto-Encoders
    Turinici, Gabriel
    NEURAL NETWORKS, 2021, 141 : 294 - 305
  • [7] Self-Supervised Variational Auto-Encoders
    Gatopoulos, Ioannis
    Tomczak, Jakub M.
    ENTROPY, 2021, 23 (06)
  • [8] InvMap and Witness Simplicial Variational Auto-Encoders
    Medbouhi, Aniss Aiman
    Polianskii, Vladislav
    Varava, Anastasia
    Kragic, Danica
    MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2023, 5 (01): : 199 - 236
  • [9] Variational auto-encoders based on the shift correction for imputation of specific missing in multivariate time series
    Li, Junying
    Ren, Weijie
    Han, Min
    MEASUREMENT, 2021, 186
  • [10] Automatic selection of latent variables in variational auto-encoders
    Jouffroy, Emma
    Giremus, Audrey
    Berthoumieu, Yannick
    Bach, Olivier
    Hugget, Alain
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 1407 - 1411