Statistics or biology: the zero-inflation controversy about scRNA-seq data

被引:114
作者
Jiang, Ruochen [1 ]
Sun, Tianyi [1 ]
Song, Dongyuan [2 ]
Li, Jingyi Jessica [1 ,3 ,4 ,5 ]
机构
[1] Univ Calif Los Angeles, Dept Stat, Los Angeles, CA 90095 USA
[2] Univ Calif Los Angeles, Bioinformat Interdept PhD Program, Los Angeles, CA 90095 USA
[3] Univ Calif Los Angeles, Dept Human Genet, Los Angeles, CA 90095 USA
[4] Univ Calif Los Angeles, Dept Computat Med, Los Angeles, CA 90095 USA
[5] Univ Calif Los Angeles, Dept Stat, Los Angeles, CA 90095 USA
基金
美国国家科学基金会;
关键词
CELL GENE-EXPRESSION; SINGLE-CELL; RNA-SEQ; FATE DECISIONS; DNA; RECONSTRUCTION; AMPLIFICATION; IMPUTATION; BINDING; MODEL;
D O I
10.1186/s13059-022-02601-5
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Researchers view vast zeros in single-cell RNA-seq data differently: some regard zeros as biological signals representing no or low gene expression, while others regard zeros as missing data to be corrected. To help address the controversy, here we discuss the sources of biological and non-biological zeros; introduce five mechanisms of adding non-biological zeros in computational benchmarking; evaluate the impacts of non-biological zeros on data analysis; benchmark three input data types: observed counts, imputed counts, and binarized counts; discuss the open questions regarding non-biological zeros; and advocate the importance of transparent analysis.
引用
收藏
页数:24
相关论文
共 122 条
[61]   Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets [J].
Macosko, Evan Z. ;
Basu, Anindita ;
Satija, Rahul ;
Nemesh, James ;
Shekhar, Karthik ;
Goldman, Melissa ;
Tirosh, Itay ;
Bialas, Allison R. ;
Kamitaki, Nolan ;
Martersteck, Emily M. ;
Trombetta, John J. ;
Weitz, David A. ;
Sanes, Joshua R. ;
Shalek, Alex K. ;
Regev, Aviv ;
McCarroll, Steven A. .
CELL, 2015, 161 (05) :1202-1214
[62]   DEsingle for detecting three types of differential expression in single-cell RNA-seq data [J].
Miao, Zhun ;
Deng, Ke ;
Wang, Xiaowo ;
Zhang, Xuegong .
BIOINFORMATICS, 2018, 34 (18) :3223-3224
[63]   Decoding the regulatory network of early blood development from single-cell gene expression measurements [J].
Moignard, Victoria ;
Woodhouse, Steven ;
Haghverdi, Laleh ;
Lilly, Andrew J. ;
Tanaka, Yosuke ;
Wilkinson, Adam C. ;
Buettner, Florian ;
Macaulay, Iain C. ;
Jawaid, Wajid ;
Diamanti, Evangelia ;
Nishikawa, Shin-Ichi ;
Piterman, Nir ;
Kouskoff, Valerie ;
Theis, Fabian J. ;
Fisher, Jasmin ;
Goettgens, Berthold .
NATURE BIOTECHNOLOGY, 2015, 33 (03) :269-+
[64]   Mclmpute: Matrix Completion Based Imputation for Single Cell RNA-seq Data [J].
Mongia, Aanchal ;
Sengupta, Debarka ;
Majumdar, Angshul .
FRONTIERS IN GENETICS, 2019, 10
[65]   Locality Sensitive Imputation for Single Cell RNA-Seq Data [J].
Moussa, Marmar ;
Mandoiu, Ion I. .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2019, 26 (08) :822-835
[66]   Modeling stochasticity in gene regulation: Characterization in the terms of the underlying distribution function [J].
Paszek, Pawel .
BULLETIN OF MATHEMATICAL BIOLOGY, 2007, 69 (05) :1567-1601
[67]   MARKOVIAN MODELING OF GENE-PRODUCT SYNTHESIS [J].
PECCOUD, J ;
YCART, B .
THEORETICAL POPULATION BIOLOGY, 1995, 48 (02) :222-234
[68]   SCRABBLE: single-cell RNA-seq imputation constrained by bulk RNA-seq data [J].
Peng, Tao ;
Zhu, Qin ;
Yin, Penghang ;
Tan, Kai .
GENOME BIOLOGY, 2019, 20 (1)
[69]   Full-length RNA-seq from single cells using Smart-seq2 [J].
Picelli, Simone ;
Faridani, Omid R. ;
Bjorklund, Asa K. ;
Winberg, Gosta ;
Sagasser, Sven ;
Sandberg, Rickard .
NATURE PROTOCOLS, 2014, 9 (01) :171-181
[70]   ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis [J].
Pierson, Emma ;
Yau, Christopher .
GENOME BIOLOGY, 2015, 16