baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data

被引:493
作者
Hardcastle, Thomas J. [1 ]
Kelly, Krystyna A. [1 ]
机构
[1] Univ Cambridge, Dept Plant Sci, Cambridge, England
来源
BMC BIOINFORMATICS | 2010年 / 11卷
关键词
SAGE; BIOCONDUCTOR; PACKAGE;
D O I
10.1186/1471-2105-11-422
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: High throughput sequencing has become an important technology for studying expression levels in many types of genomic, and particularly transcriptomic, data. One key way of analysing such data is to look for elements of the data which display particular patterns of differential expression in order to take these forward for further analysis and validation. Results: We propose a framework for defining patterns of differential expression and develop a novel algorithm, baySeq, which uses an empirical Bayes approach to detect these patterns of differential expression within a set of sequencing samples. The method assumes a negative binomial distribution for the data and derives an empirically determined prior distribution from the entire dataset. We examine the performance of the method on real and simulated data. Conclusions: Our method performs at least as well, and often better, than existing methods for analyses of pairwise differential expression in both real and simulated data. When we compare methods for the analysis of data from experimental designs involving multiple sample groups, our method again shows substantial gains in performance. We believe that this approach thus represents an important step forward for the analysis of count data from sequencing experiments.
引用
收藏
页数:14
相关论文
共 25 条
  • [1] ANDERS S, 2010, NATURE P
  • [2] [Anonymous], 2007, R LANG ENV STAT COMP
  • [3] Overdispersed logistic regression for SAGE: Modelling multiple groups and covariates
    Baggerly, KA
    Deng, L
    Morris, JS
    Aldaz, CM
    [J]. BMC BIOINFORMATICS, 2004, 5 (1)
  • [4] Whole-genome re-sequencing
    Bentley, David R.
    [J]. CURRENT OPINION IN GENETICS & DEVELOPMENT, 2006, 16 (06) : 545 - 552
  • [5] Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments
    Bullard, James H.
    Purdom, Elizabeth
    Hansen, Kasper D.
    Dudoit, Sandrine
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [6] Methods for approximating integrals in statistics with special emphasis on Bayesian integration problems
    Evans, M
    Swartz, T
    [J]. STATISTICAL SCIENCE, 1995, 10 (03) : 254 - 272
  • [7] Bioconductor: open software development for computational biology and bioinformatics
    Gentleman, RC
    Carey, VJ
    Bates, DM
    Bolstad, B
    Dettling, M
    Dudoit, S
    Ellis, B
    Gautier, L
    Ge, YC
    Gentry, J
    Hornik, K
    Hothorn, T
    Huber, W
    Iacus, S
    Irizarry, R
    Leisch, F
    Li, C
    Maechler, M
    Rossini, AJ
    Sawitzki, G
    Smith, C
    Smyth, G
    Tierney, L
    Yang, JYH
    Zhang, JH
    [J]. GENOME BIOLOGY, 2004, 5 (10)
  • [8] HARDCASTLE TJ, 2009, BAYSEQ PATTERNS DIFF
  • [9] Empirical Bayes microarray ANOVA and grouping cell lines by equal expression levels
    Lonnstedt, Ingrid
    Rimini, Rebecca
    Nilsson, Peter
    [J]. STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2005, 4 : i - 31
  • [10] Identifying differential expression in multiple SAGE libraries: an overdispersed log-linear model approach
    Lu, J
    Tomfohr, JK
    Kepler, TB
    [J]. BMC BIOINFORMATICS, 2005, 6 (1)