Genomic data imputation with variational auto-encoders

被引:44
|
作者
Qiu, Yeping Lina [1 ,2 ]
Zheng, Hong [1 ]
Gevaert, Olivier [1 ,3 ]
机构
[1] Stanford Univ, Stanford Ctr Biomed Informat Res, Dept Med, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA
[3] Stanford Univ, Dept Biomed Data Sci, Stanford, CA 94305 USA
来源
GIGASCIENCE | 2020年 / 9卷 / 08期
基金
美国国家卫生研究院;
关键词
imputation; variational auto-encoder; deep learning; MISSING VALUE IMPUTATION; AUTOENCODERS; NETWORK;
D O I
10.1093/gigascience/giaa082
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: As missing values are frequently present in genomic data, practical methods to handle missing data are necessary for downstream analyses that require complete data sets. State-of-the-art imputation techniques, including methods based on singular value decomposition and K-nearest neighbors, can be computationally expensive for large data sets and it is difficult to modify these algorithms to handle certain cases not missing at random. Results: In this work, we use a deep-learning framework based on the variational auto-encoder (VAE) for genomic missing value imputation and demonstrate its effectiveness in transcriptome and methylome data analysis. We show that in the vast majority of our testing scenarios, VAE achieves similar or better performances than the most widely used imputation standards, while having a computational advantage at evaluation time. When dealing with data missing not at random (e.g., few values are missing), we develop simple yet effective methodologies to leverage the prior knowledge about missing data. Furthermore, we investigate the effect of varying latent space regularization strength in VAE on the imputation performances and, in this context, show why VAE has a better imputation capacity compared to a regular deterministic auto-encoder. Conclusions: We describe a deep learning imputation framework for transcriptome and methylome data using a VAE and show that it can be a preferable alternative to traditional methods for data imputation, especially in the setting of large-scale data and certain missing-not-at-random scenarios.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Generation and Extraction of Color Palettes with Adversarial Variational Auto-Encoders
    Moussa, Ahmad
    Watanabe, Hiroshi
    PROCEEDINGS OF SIXTH INTERNATIONAL CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICICT 2021), VOL 2, 2022, 236 : 889 - 897
  • [32] Transforming Auto-Encoders
    Hinton, Geoffrey E.
    Krizhevsky, Alex
    Wang, Sida D.
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2011, PT I, 2011, 6791 : 44 - 51
  • [33] SAE-Impute: imputation for single-cell data via subspace regression and auto-encoders
    Bai, Liang
    Ji, Boya
    Wang, Shulin
    BMC BIOINFORMATICS, 2024, 25 (01):
  • [34] On Disentanglement and Mutual Information in Semi-Supervised Variational Auto-Encoders
    Gordon Rodriguez, Elliott
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 1257 - 1262
  • [35] Unsupervised Phonocardiogram Analysis With Distribution Density Based Variational Auto-Encoders
    Li, Shengchen
    Tian, Ke
    FRONTIERS IN MEDICINE, 2021, 8
  • [36] VARIATIONAL AUTO-ENCODERS WITHOUT GRAPH COARSENING FOR FINE MESH LEARNING
    Vercheval, Nicolas
    De Bie, Hendrik
    Pizurica, Aleksandra
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2681 - 2685
  • [37] Micro and Macro Level Graph Modeling for Graph Variational Auto-Encoders
    Zahirnia, Kiarash
    Schulte, Oliver
    Naddaf, Parmis
    Li, Ke
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [38] Adversarial Training of Variational Auto-encoders for High Fidelity Image Generation
    Khan, Salman H.
    Hayat, Munawar
    Barnes, Nick
    2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 1312 - 1320
  • [39] Learning from Nested Data with Ornstein Auto-Encoders
    Choi, Youngwon
    Lee, Sungdong
    Won, Joong-Ho
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [40] Towards Deeper Understanding of Variational Auto-encoders for Binary Collaborative Filtering
    Zamani, Siamak
    Li, Dingcheng
    Fei, Hongliang
    Li, Ping
    PROCEEDINGS OF THE 2022 ACM SIGIR INTERNATIONAL CONFERENCE ON THE THEORY OF INFORMATION RETRIEVAL, ICTIR 2022, 2022, : 175 - 184