scDC: single cell differential composition analysis

被引:25
作者
Cao, Yue [1 ]
Lin, Yingxin [1 ]
Ormerod, John T. [1 ]
Yang, Pengyi [1 ,2 ,3 ]
Yang, Jean Y. H. [1 ,2 ]
Lo, Kitty K. [1 ]
机构
[1] Univ Sydney, Sch Math & Stat, Sydney, NSW 2006, Australia
[2] Univ Sydney, Charles Perkins Ctr, Sydney, NSW 2006, Australia
[3] Univ Sydney, Fac Med & Hlth, Childrens Med Res Inst, Sydney, NSW 2145, Australia
基金
澳大利亚研究理事会; 澳大利亚国家健康与医学研究理事会;
关键词
Single cell; RNA-seq; scRNA-seq; Composition analysis; SIMULTANEOUS CONFIDENCE-INTERVALS;
D O I
10.1186/s12859-019-3211-9
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Differences in cell-type composition across subjects and conditions often carry biological significance. Recent advancements in single cell sequencing technologies enable cell-types to be identified at the single cell level, and as a result, cell-type composition of tissues can now be studied in exquisite detail. However, a number of challenges remain with cell-type composition analysis - none of the existing methods can identify cell-type perfectly and variability related to cell sampling exists in any single cell experiment. This necessitates the development of method for estimating uncertainty in cell-type composition. Results: We developed a novel single cell differential composition (scDC) analysis method that performs differential cell-type composition analysis via bootstrap resampling. scDC captures the uncertainty associated with cell-type proportions of each subject via bias-corrected and accelerated bootstrap confidence intervals. We assessed the performance of our method using a number of simulated datasets and synthetic datasets curated from publicly available single cell datasets. In simulated datasets, scDC correctly recovered the true cell-type proportions. In synthetic datasets, the cell-type compositions returned by scDC were highly concordant with reference cell-type compositions from the original data. Since the majority of datasets tested in this study have only 2 to 5 subjects per condition, the addition of confidence intervals enabled better comparisons of compositional differences between subjects and across conditions. Conclusions: scDC is a novel statistical method for performing differential cell-type composition analysis for scRNA-seq data. It uses bootstrap resampling to estimate the standard errors associated with cell-type proportion estimates and performs significance testing through GLM and GLMM models. We have made this method available to the scientific community as part of the scdney package (Single Cell Data Integrative Analysis) R package, available from https://github.com/SydneyBioX/scdney.
引用
收藏
页数:12
相关论文
共 25 条
[1]   Patterns of Immune Infiltration in Breast Cancer and Their Clinical Implications: A Gene-Expression-Based Retrospective Study [J].
Ali, H. Raza ;
Chlon, Leon ;
Pharoah, Paul D. P. ;
Markowetz, Florian ;
Caldas, Carlos .
PLOS MEDICINE, 2016, 13 (12)
[2]  
[Anonymous], BIOINFORMATICS OXFOR
[3]  
[Anonymous], 1990, Stat Papers, DOI DOI 10.1007/BF02924688
[4]  
[Anonymous], 1993, MONOGR STAT APPL PRO
[5]  
[Anonymous], CODAWORK 2008
[6]   Fitting Linear Mixed-Effects Models Using lme4 [J].
Bates, Douglas ;
Maechler, Martin ;
Bolker, Benjamin M. ;
Walker, Steven C. .
JOURNAL OF STATISTICAL SOFTWARE, 2015, 67 (01) :1-48
[7]  
Bian GR, 2017, MSPHERE, V2, DOI [10.1128/mSphere.00327-17, 10.1128/msphere.00327-17]
[8]   VARIABLE SELECTION FOR SPARSE DIRICHLET-MULTINOMIAL REGRESSION WITH AN APPLICATION TO MICROBIOME DATA ANALYSIS [J].
Chen, Jun ;
Li, Hongzhe .
ANNALS OF APPLIED STATISTICS, 2013, 7 (01) :418-442
[9]   Single cell transcriptomics reveals spatial and temporal dynamics of gene expression in the developing mouse spinal cord [J].
Delile, Julien ;
Rayon, Teresa ;
Melchionda, Manuela ;
Edwards, Amelia ;
Briscoe, James ;
Sagner, Andreas .
DEVELOPMENT, 2019, 146 (12)
[10]  
Duo Angelo, 2018, F1000Res, V7, P1141, DOI 10.12688/f1000research.15666.3