Shrinkage estimation of dispersion in Negative Binomial models for RNA-seq experiments with small sample size

被引:89
作者
Yu, Danni [1 ,2 ]
Huber, Wolfgang [1 ]
Vitek, Olga [2 ,3 ]
机构
[1] European Mol Biol Lab, Genome Biol Unit, D-69117 Heidelberg, Germany
[2] Purdue Univ, Dept Stat, W Lafayette, IN 47907 USA
[3] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
基金
美国国家科学基金会;
关键词
DIFFERENTIAL EXPRESSION ANALYSIS; MAXIMUM-LIKELIHOOD-ESTIMATION; PARAMETER; REPRODUCIBILITY; NORMALIZATION; POWERFUL; PACKAGE;
D O I
10.1093/bioinformatics/btt143
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: RNA-seq experiments produce digital counts of reads that are affected by both biological and technical variation. To distinguish the systematic changes in expression between conditions from noise, the counts are frequently modeled by the Negative Binomial distribution. However, in experiments with small sample size, the per-gene estimates of the dispersion parameter are unreliable. Method: We propose a simple and effective approach for estimating the dispersions. First, we obtain the initial estimates for each gene using the method of moments. Second, the estimates are regularized, i.e. shrunk towards a common value that minimizes the average squared difference between the initial estimates and the shrinkage estimates. The approach does not require extra modeling assumptions, is easy to compute and is compatible with the exact test of differential expression. Results: We evaluated the proposed approach using 10 simulated and experimental datasets and compared its performance with that of currently popular packages edgeR, DESeq, baySeq, BBSeq and SAMseq. For these datasets, sSeq performed favorably for experiments with small sample size in sensitivity, specificity and computational time.
引用
收藏
页码:1275 / 1282
页数:8
相关论文
共 50 条
[1]   Differential expression analysis for sequence count data [J].
Anders, Simon ;
Huber, Wolfgang .
GENOME BIOLOGY, 2010, 11 (10)
[2]  
[Anonymous], 2013, Regression Analysis of Count Data
[3]   Cross-platform comparison of SYBR® Green real-time PCR with TaqMan PCR, microarrays and other gene expression measurement technologies evaluated in the MicroArray Quality Control (MAQC) study [J].
Arikawa, Emi ;
Sun, Yanyang ;
Wang, Jie ;
Zhou, Qiong ;
Ning, Baitang ;
Dial, Stacey L. ;
Guo, Lei ;
Yang, Jingping .
BMC GENOMICS, 2008, 9 (1)
[4]   A Two-Stage Poisson Model for Testing RNA-Seq Data [J].
Auer, Paul L. ;
Doerge, Rebecca W. .
STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2011, 10 (01)
[5]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[6]   Evaluating Gene Expression in C57BL/6J and DBA/2J Mouse Striatum Using RNA-Seq and Microarrays [J].
Bottomly, Daniel ;
Walter, Nicole A. R. ;
Hunter, Jessica Ezzell ;
Darakjian, Priscila ;
Kawane, Sunita ;
Buck, Kari J. ;
Searles, Robert P. ;
Mooney, Michael ;
McWeeney, Shannon K. ;
Hitzemann, Robert .
PLOS ONE, 2011, 6 (03)
[7]   EXTENDED MOMENT SERIES AND THE PARAMETERS OF THE NEGATIVE BINOMIAL-DISTRIBUTION [J].
BOWMAN, KO .
BIOMETRICS, 1984, 40 (01) :249-252
[8]   Conservation of an RNA regulatory map between Drosophila and mammals [J].
Brooks, Angela N. ;
Yang, Li ;
Duff, Michael O. ;
Hansen, Kasper D. ;
Park, Jung W. ;
Dudoit, Sandrine ;
Brenner, Steven E. ;
Graveley, Brenton R. .
GENOME RESEARCH, 2011, 21 (02) :193-202
[9]   Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments [J].
Bullard, James H. ;
Purdom, Elizabeth ;
Hansen, Kasper D. ;
Dudoit, Sandrine .
BMC BIOINFORMATICS, 2010, 11
[10]   ESTIMATION OF THE NEGATIVE BINOMIAL PARAMETER-KAPPA BY MAXIMUM QUASI-LIKELIHOOD [J].
CLARK, SJ ;
PERRY, JN .
BIOMETRICS, 1989, 45 (01) :309-316