Benchmarking UMI-based single-cell RNA-seq preprocessing workflows

被引:32
作者
You, Yue [1 ,2 ]
Tian, Luyi [1 ,2 ]
Su, Shian [1 ,2 ]
Dong, Xueyi [1 ,2 ]
Jabbari, Jafar S. [3 ,4 ]
Hickey, Peter F. [1 ,2 ,5 ]
Ritchie, Matthew E. [1 ,2 ,6 ]
机构
[1] Walter & Eliza Hall Inst Med Res, Epigenet & Dev Div, 1G Royal Parade, Parkville, Vic, Australia
[2] Univ Melbourne, Dept Med Biol, Parkville, Vic, Australia
[3] Victorian Comprehens Canc Ctr, Australian Genome Res Facil, Melbourne, Vic, Australia
[4] Univ Melbourne, Peter Doherty Inst Infect & Immun, Dept Microbiol & Immunol, Microbiol Diagnost Unit Publ Hlth Lab, Melbourne, Vic, Australia
[5] Walter & Eliza Hall Inst Med Res, Single Cell Open Res Endeavour SCORE, 1G Royal Parade, Parkville, Vic, Australia
[6] Univ Melbourne, Sch Math & Stat, Parkville, Vic, Australia
基金
英国医学研究理事会; 澳大利亚研究理事会; 澳大利亚国家健康与医学研究理事会;
关键词
scRNA-seq; Transcriptomics; Methods comparison; Sequencing analysis; Preprocessing; DIFFERENTIAL EXPRESSION ANALYSIS; QUANTIFICATION; READS; BIAS;
D O I
10.1186/s13059-021-02552-3
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background Single-cell RNA-sequencing (scRNA-seq) technologies and associated analysis methods have rapidly developed in recent years. This includes preprocessing methods, which assign sequencing reads to genes to create count matrices for downstream analysis. While several packaged preprocessing workflows have been developed to provide users with convenient tools for handling this process, how they compare to one another and how they influence downstream analysis have not been well studied. Results Here, we systematically benchmark the performance of 10 end-to-end preprocessing workflows (Cell Ranger, Optimus, salmon alevin, alevin-fry, kallisto bustools, dropSeqPipe, scPipe, zUMIs, celseq2, and scruff) using datasets yielding different biological complexity levels generated by CEL-Seq2 and 10x Chromium platforms. We compare these workflows in terms of their quantification properties directly and their impact on normalization and clustering by evaluating the performance of different method combinations. While the scRNA-seq preprocessing workflows compared vary in their detection and quantification of genes across datasets, after downstream analysis with performant normalization and clustering methods, almost all combinations produce clustering results that agree well with the known cell type labels that provided the ground truth in our analysis. Conclusions In summary, the choice of preprocessing method was found to be less important than other steps in the scRNA-seq analysis process. Our study comprehensively compares common scRNA-seq preprocessing workflows and summarizes their characteristics to guide workflow users.
引用
收藏
页数:32
相关论文
共 75 条
  • [1] A comparison of automatic cell identification methods for single-cell RNA sequencing data
    Abdelaal, Tamim
    Michielsen, Lieke
    Cats, Davy
    Hoogduin, Dylan
    Mei, Hailiang
    Reinders, Marcel J. T.
    Mahfouz, Ahmed
    [J]. GENOME BIOLOGY, 2019, 20 (01)
  • [2] glmGamPoi: fitting Gamma-Poisson generalized linear models on single cell count data
    Ahlmann-Eltze, Constantin
    Huber, Wolfgang
    [J]. BIOINFORMATICS, 2020, 36 (24) : 5701 - 5702
  • [3] Orchestrating single-cell analysis with Bioconductor
    Amezquita, Robert A.
    Lun, Aaron T. L.
    Becht, Etienne
    Carey, Vince J.
    Carpp, Lindsay N.
    Geistlinger, Ludwig
    Marini, Federico
    Rue-Albrecht, Kevin
    Risso, Davide
    Soneson, Charlotte
    Waldron, Levi
    Pages, Herve
    Smith, Mike L.
    Huber, Wolfgang
    Morgan, Martin
    Gottardo, Raphael
    Hicks, Stephanie C.
    [J]. NATURE METHODS, 2020, 17 (02) : 137 - 145
  • [4] Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage
    Aran, Dvir
    Looney, Agnieszka P.
    Liu, Leqian
    Wu, Esther
    Fong, Valerie
    Hsu, Austin
    Chak, Suzanna
    Naikawadi, Ram P.
    Wolters, Paul J.
    Abate, Adam R.
    Butte, Atul J.
    Bhattacharya, Mallar
    [J]. NATURE IMMUNOLOGY, 2019, 20 (02) : 163 - +
  • [5] Fast unfolding of communities in large networks
    Blondel, Vincent D.
    Guillaume, Jean-Loup
    Lambiotte, Renaud
    Lefebvre, Etienne
    [J]. JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2008,
  • [6] Booeshaghi A, 2021, BIORXIV, DOI [10.1101/2021.01.25.428188, DOI 10.1101/2021.01.25.428188]
  • [7] Near-optimal probabilistic RNA-seq quantification (vol 34, pg 525, 2016)
    Bray, Nicolas L.
    Pimentel, Harold
    Melsted, Pall
    Pachter, Lior
    [J]. NATURE BIOTECHNOLOGY, 2016, 34 (08) : 888 - 888
  • [8] A multicenter study benchmarking single-cell RNA sequencing technologies using reference samples
    Chen, Wanqiu
    Zhao, Yongmei
    Chen, Xin
    Yang, Zhaowei
    Xu, Xiaojiang
    Bi, Yingtao
    Chen, Vicky
    Li, Jing
    Choi, Hannah
    Ernest, Ben
    Tran, Bao
    Mehta, Monika
    Kumar, Parimal
    Farmer, Andrew
    Mir, Alain
    Mehra, Urvashi Ann
    Li, Jian-Liang
    Moos, Malcolm, Jr.
    Xiao, Wenming
    Wang, Charles
    [J]. NATURE BIOTECHNOLOGY, 2021, 39 (09) : 1103 - +
  • [9] Performance Assessment and Selection of Normalization Procedures for Single-Cell RNA-Seq
    Cole, Michael B.
    Risso, Davide
    Wagner, Allon
    DeTomaso, David
    Ngai, John
    Purdom, Elizabeth
    Dudoit, Sandrine
    Yosef, Nir
    [J]. CELL SYSTEMS, 2019, 8 (04) : 315 - +
  • [10] A survey of best practices for RNA-seq data analysis
    Conesa, Ana
    Madrigal, Pedro
    Tarazona, Sonia
    Gomez-Cabrero, David
    Cervera, Alejandra
    McPherson, Andrew
    Szczesniak, Michal Wojciech
    Gaffney, Daniel J.
    Elo, Laura L.
    Zhang, Xuegong
    Mortazavi, Ali
    [J]. GENOME BIOLOGY, 2016, 17