Benchmarking of cell type deconvolution pipelines for transcriptomics data

被引:257
作者
Cobos, Francisco Avila [1 ,2 ,3 ]
Alquicira-Hernandez, Jose [3 ,4 ]
Powell, Joseph E. [3 ,4 ]
Mestdagh, Pieter [1 ,2 ]
De Preter, Katleen [1 ,2 ]
机构
[1] Univ Ghent, Ctr Med Genet Ghent, Dept Biomol Med, Ghent, Belgium
[2] Canc Res Inst Ghent CRIG, Ghent, Belgium
[3] Garvan Inst Med Res, Garvan Weizmann Ctr Cellular Genom, Sydney, NSW, Australia
[4] Univ Queensland, Inst Mol Biosci, Brisbane, Qld, Australia
基金
欧盟地平线“2020”;
关键词
NORMALIZATION; SIGNATURES;
D O I
10.1038/s41467-020-19015-1
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Many computational methods have been developed to infer cell type proportions from bulk transcriptomics data. However, an evaluation of the impact of data transformation, pre-processing, marker selection, cell type composition and choice of methodology on the deconvolution results is still lacking. Using five single-cell RNA-sequencing (scRNA-seq) datasets, we generate pseudo-bulk mixtures to evaluate the combined impact of these factors. Both bulk deconvolution methodologies and those that use scRNA-seq data as reference perform best when applied to data in linear scale and the choice of normalization has a dramatic impact on some, but not all methods. Overall, methods that use scRNA-seq data have comparable performance to the best performing bulk methods whereas semi-supervised approaches show higher error values. Moreover, failure to include cell types in the reference that are present in a mixture leads to substantially worse results, regardless of the previous choices. Altogether, we evaluate the combined impact of factors affecting the deconvolution task across different datasets and propose general guidelines to maximize its performance. Inferring cell type proportions from transcriptomics data is affected by data transformation, normalization, choice of method and the markers used. Here, the authors use single-cell RNAseq datasets to evaluate the impact of these factors and propose guidelines to maximise deconvolution performance.
引用
收藏
页数:14
相关论文
共 67 条
[1]   A comparison of automatic cell identification methods for single-cell RNA sequencing data [J].
Abdelaal, Tamim ;
Michielsen, Lieke ;
Cats, Davy ;
Hoogduin, Dylan ;
Mei, Hailiang ;
Reinders, Marcel J. T. ;
Mahfouz, Ahmed .
GENOME BIOLOGY, 2019, 20 (01)
[2]   Digital cell quantification identifies global immune cell dynamics during influenza infection [J].
Altboum, Zeev ;
Steuerman, Yael ;
David, Eyal ;
Barnett-Itzhaki, Zohar ;
Valadarsky, Liran ;
Keren-Shaul, Hadas ;
Meningher, Tal ;
Mendelson, Ella ;
Mandelboim, Michal ;
Gat-Viks, Irit ;
Amit, Ido .
MOLECULAR SYSTEMS BIOLOGY, 2014, 10 (02)
[3]   Differential expression analysis for sequence count data [J].
Anders, Simon ;
Huber, Wolfgang .
GENOME BIOLOGY, 2010, 11 (10)
[4]   A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure [J].
Baron, Maayan ;
Veres, Adrian ;
Wolock, Samuel L. ;
Faust, Aubrey L. ;
Gaujoux, Renaud ;
Vetere, Amedeo ;
Ryu, Jennifer Hyoje ;
Wagner, Bridget K. ;
Shen-Orr, Shai S. ;
Klein, Allon M. ;
Melton, Douglas A. ;
Yanai, Itai .
CELL SYSTEMS, 2016, 3 (04) :346-+
[5]  
Becker R.A., 1988, WADSWORTH BROOKSCOLE
[6]   A comparison of normalization methods for high density oligonucleotide array data based on variance and bias [J].
Bolstad, BM ;
Irizarry, RA ;
Åstrand, M ;
Speed, TP .
BIOINFORMATICS, 2003, 19 (02) :185-193
[7]   Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments [J].
Bullard, James H. ;
Purdom, Elizabeth ;
Hansen, Kasper D. ;
Dudoit, Sandrine .
BMC BIOINFORMATICS, 2010, 11
[8]  
CHAMBERS J, 1990, COMPSTAT 1990 : PROCEEDINGS IN COMPUTATIONAL STATISTICS, P317
[9]  
Clark-Carter D., 2014, Wiley StatsRef: Statistics Reference Online, DOI [DOI 10.1002/9781118445112.STAT06236, 10.1002/9781118445112.stat06236]
[10]   Computational deconvolution of transcriptomics data from mixed cell populations [J].
Cobos, Francisco Avila ;
Vandesompele, Jo ;
Mestdagh, Pieter ;
De Preter, Katleen .
BIOINFORMATICS, 2018, 34 (11) :1969-1979