Genomic data imputation with variational auto-encoders

被引:44
|
作者
Qiu, Yeping Lina [1 ,2 ]
Zheng, Hong [1 ]
Gevaert, Olivier [1 ,3 ]
机构
[1] Stanford Univ, Stanford Ctr Biomed Informat Res, Dept Med, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA
[3] Stanford Univ, Dept Biomed Data Sci, Stanford, CA 94305 USA
来源
GIGASCIENCE | 2020年 / 9卷 / 08期
基金
美国国家卫生研究院;
关键词
imputation; variational auto-encoder; deep learning; MISSING VALUE IMPUTATION; AUTOENCODERS; NETWORK;
D O I
10.1093/gigascience/giaa082
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: As missing values are frequently present in genomic data, practical methods to handle missing data are necessary for downstream analyses that require complete data sets. State-of-the-art imputation techniques, including methods based on singular value decomposition and K-nearest neighbors, can be computationally expensive for large data sets and it is difficult to modify these algorithms to handle certain cases not missing at random. Results: In this work, we use a deep-learning framework based on the variational auto-encoder (VAE) for genomic missing value imputation and demonstrate its effectiveness in transcriptome and methylome data analysis. We show that in the vast majority of our testing scenarios, VAE achieves similar or better performances than the most widely used imputation standards, while having a computational advantage at evaluation time. When dealing with data missing not at random (e.g., few values are missing), we develop simple yet effective methodologies to leverage the prior knowledge about missing data. Furthermore, we investigate the effect of varying latent space regularization strength in VAE on the imputation performances and, in this context, show why VAE has a better imputation capacity compared to a regular deterministic auto-encoder. Conclusions: We describe a deep learning imputation framework for transcriptome and methylome data using a VAE and show that it can be a preferable alternative to traditional methods for data imputation, especially in the setting of large-scale data and certain missing-not-at-random scenarios.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Continuous imputation of missing values in time series via Wasserstein generative adversarial imputation networks and variational auto-encoders model
    Wang, Yunsheng
    Xu, Xinghan
    Hu, Lei
    Liu, Jianwei
    Yan, Xiaohui
    Ren, Weijie
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2024, 647
  • [22] Fisher Auto-Encoders
    Elkhalil, Khalil
    Hasan, Ali
    Ding, Jie
    Farsiu, Sina
    Tarokh, Vahid
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130 : 352 - 360
  • [23] Ornstein Auto-Encoders
    Choi, Youngwon
    Won, Joong-Ho
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2172 - 2178
  • [24] Interpretable and effective hashing via Bernoulli variational auto-encoders
    Mena, Francisco
    Nanculef, Ricardo
    Valle, Carlos
    INTELLIGENT DATA ANALYSIS, 2020, 24 (24) : S141 - S166
  • [25] Data-driven Dimensional Expression Generation via Encapsulated Variational Auto-Encoders
    Wenjun Bai
    Changqin Quan
    Zhi-Wei Luo
    Cognitive Computation, 2023, 15 : 1342 - 1354
  • [26] Time-sequential variational conditional auto-encoders for recommendation
    Hozumi J.
    Iwasawa Y.
    Matsuo Y.
    1600, Japanese Society for Artificial Intelligence (36):
  • [27] Description Generation Using Variational Auto-Encoders for Precursor microRNA
    Petkovic, Marko
    Menkovski, Vlado
    ENTROPY, 2024, 26 (11)
  • [28] Ensemble kalman variational objective: a variational inference framework for sequential variational auto-encoders
    Ishizone, Tsuyoshi
    Higuchi, Tomoyuki
    Nakamura, Kazuyuki
    IEICE NONLINEAR THEORY AND ITS APPLICATIONS, 2023, 14 (04): : 691 - 717
  • [29] Data-driven Dimensional Expression Generation via Encapsulated Variational Auto-Encoders
    Bai, Wenjun
    Quan, Changqin
    Luo, Zhi-Wei
    COGNITIVE COMPUTATION, 2023, 15 (04) : 1342 - 1354
  • [30] Understanding Instance-based Interpretability of Variational Auto-Encoders
    Kong, Zhifeng
    Chaudhuri, Kamalika
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,