Statistics or biology: the zero-inflation controversy about scRNA-seq data

被引:108
作者
Jiang, Ruochen [1 ]
Sun, Tianyi [1 ]
Song, Dongyuan [2 ]
Li, Jingyi Jessica [1 ,3 ,4 ,5 ]
机构
[1] Univ Calif Los Angeles, Dept Stat, Los Angeles, CA 90095 USA
[2] Univ Calif Los Angeles, Bioinformat Interdept PhD Program, Los Angeles, CA 90095 USA
[3] Univ Calif Los Angeles, Dept Human Genet, Los Angeles, CA 90095 USA
[4] Univ Calif Los Angeles, Dept Computat Med, Los Angeles, CA 90095 USA
[5] Univ Calif Los Angeles, Dept Stat, Los Angeles, CA 90095 USA
基金
美国国家科学基金会;
关键词
CELL GENE-EXPRESSION; SINGLE-CELL; RNA-SEQ; FATE DECISIONS; DNA; RECONSTRUCTION; AMPLIFICATION; IMPUTATION; BINDING; MODEL;
D O I
10.1186/s13059-022-02601-5
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Researchers view vast zeros in single-cell RNA-seq data differently: some regard zeros as biological signals representing no or low gene expression, while others regard zeros as missing data to be corrected. To help address the controversy, here we discuss the sources of biological and non-biological zeros; introduce five mechanisms of adding non-biological zeros in computational benchmarking; evaluate the impacts of non-biological zeros on data analysis; benchmark three input data types: observed counts, imputed counts, and binarized counts; discuss the open questions regarding non-biological zeros; and advocate the importance of transparent analysis.
引用
收藏
页数:24
相关论文
共 122 条
[1]   Scalable probabilistic PCA for large-scale genetic variation data [J].
Agrawal, Aman ;
Chiu, Alec M. ;
Le, Minh ;
Halperin, Eran ;
Sankararaman, Sriram .
PLOS GENETICS, 2020, 16 (05)
[2]   Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries [J].
Aird, Daniel ;
Ross, Michael G. ;
Chen, Wei-Sheng ;
Danielsson, Maxwell ;
Fennell, Timothy ;
Russ, Carsten ;
Jaffe, David B. ;
Nusbaum, Chad ;
Gnirke, Andreas .
GENOME BIOLOGY, 2011, 12 (02)
[3]  
Alberts B., 2018, GARLAND SCI
[4]   Exploring single-cell data with deep multitasking neural networks [J].
Amodio, Matthew ;
van Dijk, David ;
Srinivasan, Krishnan ;
Chen, William S. ;
Mohsen, Hussein ;
Moon, Kevin R. ;
Campbell, Allison ;
Zhao, Yujiao ;
Wang, Xiaomei ;
Venkataswamy, Manjunatha ;
Desai, Anita ;
Ravi, V. ;
Kumar, Priti ;
Montgomery, Ruth ;
Wolf, Guy ;
Krishnaswamy, Smita .
NATURE METHODS, 2019, 16 (11) :1139-+
[5]  
Andrew, 2019, YOU SHOULD USUALLY L
[6]   Tutorial: guidelines for the computational analysis of single-cell RNA sequencing data [J].
Andrews, Tallulah S. ;
Kiselev, Vladimir Yu ;
McCarthy, Davis ;
Hemberg, Martin .
NATURE PROTOCOLS, 2021, 16 (01) :1-9
[7]  
Andrews Tallulah S, 2018, F1000Res, V7, P1740, DOI 10.12688/f1000research.16613.1
[8]   M3Drop: dropout-based feature selection for scRNASeq [J].
Andrews, Tallulah S. ;
Hemberg, Martin .
BIOINFORMATICS, 2019, 35 (16) :2865-2867
[9]   DeepImpute: an accurate, fast, and scalable deep neural network method to impute single-cell RNA-seq data [J].
Arisdakessian, Cedric ;
Poirion, Olivier ;
Yunits, Breck ;
Zhu, Xun ;
Garmire, Lana X. .
GENOME BIOLOGY, 2019, 20 (01)
[10]   Imputation of single-cell gene expression with an autoencoder neural network [J].
Badsha, Md Bahadur ;
Li, Rui ;
Liu, Boxiang ;
Li, Yang, I ;
Xian, Min ;
Banovich, Nicholas E. ;
Fu, Audrey Qiuyan .
QUANTITATIVE BIOLOGY, 2020, 8 (01) :78-94