A comparison of methods accounting for batch effects in differential expression analysis of UMI count based single cell RNA sequencing

被引:23
作者
Chen, Wenan [1 ]
Zhang, Silu [2 ]
Williams, Justin [3 ]
Ju, Bensheng [3 ]
Shaner, Bridget [3 ]
Easton, John [3 ]
Wu, Gang [1 ]
Chen, Xiang [3 ]
机构
[1] St Jude Childrens Res Hosp, Ctr Appl Bioinformat, 332 N Lauderdale St, Memphis, TN 38105 USA
[2] St Jude Childrens Res Hosp, Dept Diagnost Imaging, 332 N Lauderdale St, Memphis, TN 38105 USA
[3] St Jude Childrens Res Hosp, Dept Computat Biol, 262 Danny Thomas Pl,Mail Stop 1135, Memphis, TN 38105 USA
来源
COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL | 2020年 / 18卷 / 18期
基金
美国国家卫生研究院;
关键词
scRNA-seq; Differential expression analysis; Batch effects; Latent batch effects; Aggregation-based methods; Fixed effect model; Mixed effect model; Surrogate variable based methods; GENE-EXPRESSION; NORMALIZATION;
D O I
10.1016/j.csbj.2020.03.026
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Accounting for batch effects, especially latent batch effects, in differential expression (DE) analysis is critical for identifying true biological effects. Single-cell RNA sequencing (scRNA-seq) is a powerful tool for quantifying cell-to-cell variation in transcript abundance and characterizing cellular dynamics. Although many scRNA-seq DE analysis methods accommodate known batch variables, their performance has not been systematically evaluated. Moreover, the challenge of accounting for latent batch variables in scRNA-seq DE analysis is largely unmet. In contrast, many methods have been developed to account for batch variables (either known or latent) in other high-dimensional data, especially bulk RNA-seq. We extensively evaluate 11 methods for batch variables in different scRNA-seq DE analysis scenarios, with a primary focus on latent batch variables. We demonstrate that for known batch variables, incorporating them as covariates into a regression model outperformed approaches using a batch-corrected matrix. For latent batches, fixed effects models have inflated FDRs, whereas aggregation-based methods and mixed effects models have significant power loss. Surrogate variable based methods generally control the FDR well while achieving good power with small group effects. However, their performance (except that of SVA) deteriorated substantially in scenarios involving large group effects and/or group label impurity. In these settings, SVA achieves relatively good performance despite an occasionally inflated FDR (up to 0.2). Finally we make the following recommendations for scRNA-seq DE analysis: 1) incorporate known batch variables instead of using batch-corrected data; and 2) employ SVA for latent batch correction. However, better methods are still needed to fully unleash the power of scRNA-seq. (C) 2020 The Authors. Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology.
引用
收藏
页码:861 / 873
页数:13
相关论文
共 35 条
  • [1] [Anonymous], 2018, ARXIV180805895
  • [2] UMI-count modeling and differential expression analysis for single-cell RNA sequencing
    Chen, Wenan
    Li, Yan
    Easton, John
    Finkelstein, David
    Wu, Gang
    Chen, Xiang
    [J]. GENOME BIOLOGY, 2018, 19
  • [3] Latent cellular analysis robustly reveals subtle diversity in large-scale single-cell RNA-seq data
    Cheng, Changde
    Easton, John
    Rosencrance, Celeste
    Li, Yan
    Ju, Bensheng
    Williams, Justin
    Mulder, Heather L.
    Pang, Yakun
    Chen, Wenan
    Chen, Xiang
    [J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (22)
  • [4] Performance Assessment and Selection of Normalization Procedures for Single-Cell RNA-Seq
    Cole, Michael B.
    Risso, Davide
    Wagner, Allon
    DeTomaso, David
    Ngai, John
    Purdom, Elizabeth
    Dudoit, Sandrine
    Yosef, Nir
    [J]. CELL SYSTEMS, 2019, 8 (04) : 315 - +
  • [5] Guidelines for the use of flow cytometry and cell sorting in immunological studies
    Cossarizza, Andrea
    Chang, Hyun-Dong
    Radbruch, Andreas
    Akdis, Mubeccel
    Andrae, Immanuel
    Annunziato, Francesco
    Bacher, Petra
    Barnaba, Vincenzo
    Battistini, Luca
    Bauer, Wolfgang M.
    Baumgart, Sabine
    Becher, Burkhard
    Beisker, Wolfgang
    Berek, Claudia
    Blanco, Alfonso
    Borsellino, Giovanna
    Boulais, Philip E.
    Brinkman, Ryan R.
    Buescher, Martin
    Busch, Dirk H.
    Bushnell, Timothy P.
    Cao, Xuetao
    Cavani, Andrea
    Chattopadhyay, Pratip K.
    Cheng, Qingyu
    Chow, Sue
    Clerici, Mario
    Cooke, Anne
    Cosma, Antonio
    Cosmi, Lorenzo
    Cumano, Ana
    Dang, Van Duc
    Davies, Derek
    De Biasi, Sara
    Del Zotto, Genny
    Della Bella, Silvia
    Dellabona, Paolo
    Deniz, Gunnur
    Dessing, Mark
    Diefenbach, Andreas
    Di Santo, James
    Dieli, Francesco
    Dolf, Andreas
    Donnenberg, Vera S.
    Doerner, Thomas
    Ehrhardt, Gotz R. A.
    Endl, Elmar
    Engel, Pablo
    Engelhardt, Britta
    Esser, Charlotte
    [J]. EUROPEAN JOURNAL OF IMMUNOLOGY, 2017, 47 (10) : 1584 - 1797
  • [6] MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data
    Finak, Greg
    McDavid, Andrew
    Yajima, Masanao
    Deng, Jingyuan
    Gersuk, Vivian
    Shalek, Alex K.
    Slichter, Chloe K.
    Miller, Hannah W.
    McElrath, M. Juliana
    Prlic, Martin
    Linsley, Peter S.
    Gottardo, Raphael
    [J]. GENOME BIOLOGY, 2015, 16
  • [7] Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors
    Haghverdi, Laleh
    Lun, Aaron T. L.
    Morgan, Michael D.
    Marioni, John C.
    [J]. NATURE BIOTECHNOLOGY, 2018, 36 (05) : 421 - +
  • [8] Missing data and technical variability in single-cell RNA-sequencing experiments
    Hicks, Stephanie C.
    Townes, F. William
    Teng, Mingxiang
    Irizarry, Rafael A.
    [J]. BIOSTATISTICS, 2018, 19 (04) : 562 - 578
  • [9] A benchmark of batch-effect correction methods for single-cell RNA sequencing data
    Hoa Thi Nhu Tran
    Ang, Kok Siong
    Chevrier, Marion
    Zhang, Xiaomeng
    Lee, Nicole Yee Shin
    Goh, Michelle
    Chen, Jinmiao
    [J]. GENOME BIOLOGY, 2020, 21 (01)
  • [10] Single-cell RNA sequencing technologies and bioinformatics pipelines
    Hwang, Byungjin
    Lee, Ji Hyun
    Bang, Duhee
    [J]. EXPERIMENTAL AND MOLECULAR MEDICINE, 2018, 50 : 1 - 14