scVAE: variational auto-encoders for single-cell gene expression data

被引:138
作者
Gronbech, Christopher Heje [1 ,2 ,3 ]
Vording, Maximillian Fornitz [3 ]
Timshel, Pascal N. [4 ]
Sonderby, Casper Kaae [1 ]
Pers, Tune H. [4 ]
Winther, Ole [1 ,2 ,3 ]
机构
[1] Univ Copenhagen, Bioinformat Ctr, Dept Biol, DK-2100 Copenhagen, Denmark
[2] Copenhagen Univ Hosp, Rigshosp, Ctr Genom Med, DK-2100 Copenhagen, Denmark
[3] Tech Univ Denmark, Dept Appl Math & Comp Sci, Sect Cognit Syst, DK-2800 Lyngby, Denmark
[4] Univ Copenhagen, Novo Nordisk Fdn Ctr Basic Metab Res, Fac Hlth & Med Sci, DK-2200 Copenhagen N, Denmark
关键词
RNA-SEQUENCING DATA; MODEL;
D O I
10.1093/bioinformatics/btaa293
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Models for analysing and making relevant biological inferences from massive amounts of complex single-cell transcriptomic data typically require several individual data-processing steps, each with their own set of hyperparameter choices. With deep generative models one can work directly with count data, make likelihood-based model comparison, learn a latent representation of the cells and capture more of the variability in different cell populations. Results: We propose a novel method based on variational auto-encoders (VAEs) for analysis of single-cell RNA sequencing (scRNA-seq) data. It avoids data preprocessing by using raw count data as input and can robustly estimate the expected gene expression levels and a latent representation for each cell. We tested several count likelihood functions and a variant of the VAE that has a priori clustering in the latent space. We show for several scRNA-seq datasets that our method outperforms recently proposed scRNA-seq methods in clustering cells and that the resulting clusters reflect cell types.
引用
收藏
页码:4415 / 4422
页数:8
相关论文
共 49 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
Bowman S. R., 2016, P SIGNLL C COMP NAT, P10
[3]  
Brouwer T, 2017, PR MACH LEARN RES, V54, P557
[4]   Learning a hierarchical representation of the yeast transcriptomic machinery using an autoencoder model [J].
Chen, Lujia ;
Cai, Chunhui ;
Chen, Vicky ;
Lu, Xinghua .
BMC BIOINFORMATICS, 2016, 17
[5]  
Cui H., 2020, INT J COMPUT BIOL DR, V13
[6]  
Dilokthanakul N., 2016, Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders. arXiv
[7]   Interpretable dimensionality reduction of single cell transcriptome data with deep generative models [J].
Ding, Jiarui ;
Condon, Anne ;
Shah, Sohrab P. .
NATURE COMMUNICATIONS, 2018, 9
[8]  
Duo Angelo, 2018, F1000Res, V7, P1141, DOI 10.12688/f1000research.15666.3
[9]   CellTree: an R/bioconductor package to infer the hierarchical structure of cell populations from single-cell RNA-seq data [J].
duVerle, David A. ;
Yotsukura, Sohiya ;
Nomura, Seitaro ;
Aburatani, Hiroyuki ;
Tsuda, Koji .
BMC BIOINFORMATICS, 2016, 17
[10]   Single-cell RNA-seq denoising using a deep count autoencoder [J].
Eraslan, Goekcen ;
Simon, Lukas M. ;
Mircea, Maria ;
Mueller, Nikola S. ;
Theis, Fabian J. .
NATURE COMMUNICATIONS, 2019, 10 (1)