Model-based autoencoders for imputing discrete single-cell RNA-seq data

被引:11
作者
Tian, Tian [1 ]
Min, Martin Renqiang [2 ]
Wei, Zhi [1 ]
机构
[1] New Jersey Inst Technol, Dept Comp Sci, Newark, NJ 07102 USA
[2] NEC Labs Amer, Princeton, NJ 08540 USA
基金
美国国家科学基金会;
关键词
Deep learning; scRNA-seq; Imputation; TECHNOLOGIES;
D O I
10.1016/j.ymeth.2020.09.010
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Deep neural networks have been widely applied for missing data imputation. However, most existing studies have been focused on imputing continuous data, while discrete data imputation is under-explored. Discrete data is common in real world, especially in research areas of bioinformatics, genetics, and biochemistry. In particular, large amounts of recent genomic data are discrete count data generated from single-cell RNA sequencing (scRNAseq) technology. Most scRNA-seq studies produce a discrete matrix with prevailing 'false' zero count observations (missing values). To make downstream analyses more effective, imputation, which recovers the missing values, is often conducted as the first step in pre-processing scRNA-seq data. In this paper, we propose a novel Zero-Inflated Negative Binomial (ZINB) model-based autoencoder for imputing discrete scRNA-seq data. The novelties of our method are twofold. First, in addition to optimizing the ZINB likelihood, we propose to explicitly model the dropout events that cause missing values by using the Gumbel-Softmax distribution. Second, the zero-inflated reconstruction is further optimized with respect to the raw count matrix. Extensive experiments on simulation datasets demonstrate that the zero-inflated reconstruction significantly improves imputation accuracy. Real data experiments show that the proposed imputation can enhance separating different cell types and improve the accuracy of differential expression analysis.
引用
收藏
页码:112 / 119
页数:8
相关论文
共 34 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
Angerer Philipp, 2017, Current Opinion in Systems Biology, V4, P85, DOI 10.1016/j.coisb.2017.07.004
[3]   Integrating single-cell transcriptomic data across different conditions, technologies, and species [J].
Butler, Andrew ;
Hoffman, Paul ;
Smibert, Peter ;
Papalexi, Efthymia ;
Satija, Rahul .
NATURE BIOTECHNOLOGY, 2018, 36 (05) :411-+
[4]   An omnibus test for differential distribution analysis of microbiome sequencing data [J].
Chen, Jun ;
King, Emily ;
Deek, Rebecca ;
Wei, Zhi ;
Yu, Yue ;
Grill, Diane ;
Ballman, Karla .
BIOINFORMATICS, 2018, 34 (04) :643-651
[5]   Single-cell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm [J].
Chu, Li-Fang ;
Leng, Ning ;
Zhang, Jue ;
Hou, Zhonggang ;
Mamott, Daniel ;
Vereide, David T. ;
Choi, Jeea ;
Kendziorski, Christina ;
Stewart, Ron ;
Thomson, James A. .
GENOME BIOLOGY, 2016, 17
[6]   Single-cell RNA-seq denoising using a deep count autoencoder [J].
Eraslan, Goekcen ;
Simon, Lukas M. ;
Mircea, Maria ;
Mueller, Nikola S. ;
Theis, Fabian J. .
NATURE COMMUNICATIONS, 2019, 10 (1)
[7]   Reducing the dimensionality of data with neural networks [J].
Hinton, G. E. ;
Salakhutdinov, R. R. .
SCIENCE, 2006, 313 (5786) :504-507
[8]   SAVER: gene expression recovery for single-cell RNA sequencing [J].
Huang, Mo ;
Wang, Jingshu ;
Torre, Eduardo ;
Dueck, Hannah ;
Shaffer, Sydney ;
Bonasio, Roberto ;
Murray, John I. ;
Raj, Arjun ;
Li, Mingyao ;
Zhang, Nancy R. .
NATURE METHODS, 2018, 15 (07) :539-+
[9]  
Jang E., 2017, P INT C LEARN REPR
[10]  
Kharchenko PV, 2014, NAT METHODS, V11, P740, DOI [10.1038/NMETH.2967, 10.1038/nmeth.2967]