Benchmarking of cell type deconvolution pipelines for transcriptomics data

被引:224
作者
Cobos, Francisco Avila [1 ,2 ,3 ]
Alquicira-Hernandez, Jose [3 ,4 ]
Powell, Joseph E. [3 ,4 ]
Mestdagh, Pieter [1 ,2 ]
De Preter, Katleen [1 ,2 ]
机构
[1] Univ Ghent, Ctr Med Genet Ghent, Dept Biomol Med, Ghent, Belgium
[2] Canc Res Inst Ghent CRIG, Ghent, Belgium
[3] Garvan Inst Med Res, Garvan Weizmann Ctr Cellular Genom, Sydney, NSW, Australia
[4] Univ Queensland, Inst Mol Biosci, Brisbane, Qld, Australia
基金
欧盟地平线“2020”;
关键词
NORMALIZATION; SIGNATURES;
D O I
10.1038/s41467-020-19015-1
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Many computational methods have been developed to infer cell type proportions from bulk transcriptomics data. However, an evaluation of the impact of data transformation, pre-processing, marker selection, cell type composition and choice of methodology on the deconvolution results is still lacking. Using five single-cell RNA-sequencing (scRNA-seq) datasets, we generate pseudo-bulk mixtures to evaluate the combined impact of these factors. Both bulk deconvolution methodologies and those that use scRNA-seq data as reference perform best when applied to data in linear scale and the choice of normalization has a dramatic impact on some, but not all methods. Overall, methods that use scRNA-seq data have comparable performance to the best performing bulk methods whereas semi-supervised approaches show higher error values. Moreover, failure to include cell types in the reference that are present in a mixture leads to substantially worse results, regardless of the previous choices. Altogether, we evaluate the combined impact of factors affecting the deconvolution task across different datasets and propose general guidelines to maximize its performance. Inferring cell type proportions from transcriptomics data is affected by data transformation, normalization, choice of method and the markers used. Here, the authors use single-cell RNAseq datasets to evaluate the impact of these factors and propose guidelines to maximise deconvolution performance.
引用
收藏
页数:14
相关论文
共 67 条
  • [1] A comparison of automatic cell identification methods for single-cell RNA sequencing data
    Abdelaal, Tamim
    Michielsen, Lieke
    Cats, Davy
    Hoogduin, Dylan
    Mei, Hailiang
    Reinders, Marcel J. T.
    Mahfouz, Ahmed
    [J]. GENOME BIOLOGY, 2019, 20 (01)
  • [2] Digital cell quantification identifies global immune cell dynamics during influenza infection
    Altboum, Zeev
    Steuerman, Yael
    David, Eyal
    Barnett-Itzhaki, Zohar
    Valadarsky, Liran
    Keren-Shaul, Hadas
    Meningher, Tal
    Mendelson, Ella
    Mandelboim, Michal
    Gat-Viks, Irit
    Amit, Ido
    [J]. MOLECULAR SYSTEMS BIOLOGY, 2014, 10 (02)
  • [3] Differential expression analysis for sequence count data
    Anders, Simon
    Huber, Wolfgang
    [J]. GENOME BIOLOGY, 2010, 11 (10):
  • [4] [Anonymous], 1988, Wadsworth Brooks/Cole
  • [5] A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure
    Baron, Maayan
    Veres, Adrian
    Wolock, Samuel L.
    Faust, Aubrey L.
    Gaujoux, Renaud
    Vetere, Amedeo
    Ryu, Jennifer Hyoje
    Wagner, Bridget K.
    Shen-Orr, Shai S.
    Klein, Allon M.
    Melton, Douglas A.
    Yanai, Itai
    [J]. CELL SYSTEMS, 2016, 3 (04) : 346 - +
  • [6] A comparison of normalization methods for high density oligonucleotide array data based on variance and bias
    Bolstad, BM
    Irizarry, RA
    Åstrand, M
    Speed, TP
    [J]. BIOINFORMATICS, 2003, 19 (02) : 185 - 193
  • [7] Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments
    Bullard, James H.
    Purdom, Elizabeth
    Hansen, Kasper D.
    Dudoit, Sandrine
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [8] CHAMBERS J, 1990, COMPSTAT 1990 : PROCEEDINGS IN COMPUTATIONAL STATISTICS, P317
  • [9] Clark-Carter D., 2014, Wiley StatsRef: Statistics Reference, DOI [DOI 10.1002/9781118445112.STAT06236, 10.1002/9781118445112.stat06236]
  • [10] Computational deconvolution of transcriptomics data from mixed cell populations
    Cobos, Francisco Avila
    Vandesompele, Jo
    Mestdagh, Pieter
    De Preter, Katleen
    [J]. BIOINFORMATICS, 2018, 34 (11) : 1969 - 1979