A benchmark study of simulation methods for single-cell RNA sequencing data

被引:36
作者
Cao, Yue [1 ,2 ]
Yang, Pengyi [1 ,2 ,3 ]
Yang, Jean Yee Hwa [1 ,2 ]
机构
[1] Univ Sydney, Charles Perkins Ctr, Sydney, NSW, Australia
[2] Univ Sydney, Sch Math & Stat, Sydney, NSW, Australia
[3] Childrens Med Res Inst, Computat Syst Biol Grp, Westmead, NSW, Australia
基金
澳大利亚国家健康与医学研究理事会; 英国医学研究理事会;
关键词
D O I
10.1038/s41467-021-27130-w
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Single-cell RNA-seq (scRNA-seq) data simulation is critical for evaluating computational methods for analysing scRNA-seq data especially when ground truth is experimentally unattainable. The reliability of evaluation depends on the ability of simulation methods to capture properties of experimental data. However, while many scRNA-seq data simulation methods have been proposed, a systematic evaluation of these methods is lacking. We develop a comprehensive evaluation framework, SimBench, including a kernel density estimation measure to benchmark 12 simulation methods through 35 scRNA-seq experimental datasets. We evaluate the simulation methods on a panel of data properties, ability to maintain biological signals, scalability and applicability. Our benchmark uncovers performance differences among the methods and highlights the varying difficulties in simulating data characteristics. Furthermore, we identify several limitations including maintaining heterogeneity of distribution. These results, together with the framework and datasets made publicly available as R packages, will guide simulation methods selection and their future development. Simulation is useful for developing and evaluating computational methods. Here, the authors develop a comprehensive evaluation framework, SimBench, to benchmark Single-cell RNA-seq simulation methods through a diverse collection of experimental datasets.
引用
收藏
页数:12
相关论文
共 35 条
[1]  
Anders S., 2010, GENOME BIOL, V11, pR106, DOI [10.1186/gb-2010-11-10-r106, DOI 10.1186/gb-2010-11-10-r106]
[2]  
Armstrong J.Scott., 1978, LONG RANGE FORECASTI
[3]   SPsimSeq: semi-parametric simulation of bulk and single-cell RNA-sequencing data [J].
Assefa, Alemu Takele ;
Vandesompele, Jo ;
Thas, Olivier .
BIOINFORMATICS, 2020, 36 (10) :3276-3278
[4]   SPARSim single cell: a count data simulator for scRNA-seq data [J].
Baruzzo, Giacomo ;
Patuzzi, Ilaria ;
Di Camillo, Barbara .
BIOINFORMATICS, 2020, 36 (05) :1468-1475
[5]   Spearheading future omics analyses using dyngen, a multi-modal simulator of single cells [J].
Cannoodt, Robrecht ;
Saelens, Wouter ;
Deconinck, Louise ;
Saeys, Yvan .
NATURE COMMUNICATIONS, 2021, 12 (01)
[6]  
Cao Y, SYDNEYBIOX SIMBENCH, DOI [10.5281/ZENODO.5575047, DOI 10.5281/ZENODO.5575047]
[7]   UMI-count modeling and differential expression analysis for single-cell RNA sequencing [J].
Chen, Wenan ;
Li, Yan ;
Easton, John ;
Finkelstein, David ;
Wu, Gang ;
Chen, Xiang .
GENOME BIOLOGY, 2018, 19
[8]   SERGIO: A Single-Cell Expression Simulator Guided by Gene Regulatory Networks [J].
Dibaeinia, Payam ;
Sinha, Saurabh .
CELL SYSTEMS, 2020, 11 (03) :252-+
[9]  
Ding J, 2020, NAT BIOTECHNOL, V38, P737, DOI 10.1038/s41587-020-0465-8
[10]   Using control genes to correct for unwanted variation in microarray data [J].
Gagnon-Bartsch, Johann A. ;
Speed, Terence P. .
BIOSTATISTICS, 2012, 13 (03) :539-552