BADGE: A novel Bayesian model for accurate abundance quantification and differential analysis of RNA-Seq data

被引:10
|
作者
Gu, Jinghua [1 ]
Wang, Xiao [1 ]
Halakivi-Clarke, Leena [2 ]
Clarke, Robert [2 ]
Xuan, Jianhua [1 ]
机构
[1] Virginia Polytech Inst & State Univ, Dept Elect & Comp Engn, Blacksburg, VA 24061 USA
[2] Georgetown Univ, Dept Oncol, Lombardi Comprehens Canc Ctr, Washington, DC USA
来源
BMC BIOINFORMATICS | 2014年 / 15卷
基金
美国国家卫生研究院;
关键词
EXPRESSION ANALYSIS; GENE-EXPRESSION; TRANSCRIPTOMES; INFERENCE;
D O I
10.1186/1471-2105-15-S9-S6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Recent advances in RNA sequencing (RNA-Seq) technology have offered unprecedented scope and resolution for transcriptome analysis. However, precise quantification of mRNA abundance and identification of differentially expressed genes are complicated due to biological and technical variations in RNA-Seq data. Results: We systematically study the variation in count data and dissect the sources of variation into between-sample variation and within-sample variation. A novel Bayesian framework is developed for joint estimate of gene level mRNA abundance and differential state, which models the intrinsic variability in RNA-Seq to improve the estimation. Specifically, a Poisson-Lognormal model is incorporated into the Bayesian framework to model within-sample variation; a Gamma-Gamma model is then used to model between-sample variation, which accounts for over-dispersion of read counts among multiple samples. Simulation studies, where sequencing counts are synthesized based on parameters learned from real datasets, have demonstrated the advantage of the proposed method in both quantification of mRNA abundance and identification of differentially expressed genes. Moreover, performance comparison on data from the Sequencing Quality Control (SEQC) Project with ERCC spike-in controls has shown that the proposed method outperforms existing RNA-Seq methods in differential analysis. Application on breast cancer dataset has further illustrated that the proposed Bayesian model can 'blindly' estimate sources of variation caused by sequencing biases. Conclusions: We have developed a novel Bayesian hierarchical approach to investigate within-sample and between-sample variations in RNA-Seq data. Simulation and real data applications have validated desirable performance of the proposed method.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] BADGE: A novel Bayesian model for accurate abundance quantification and differential analysis of RNA-Seq data
    Jinghua Gu
    Xiao Wang
    Leena Halakivi-Clarke
    Robert Clarke
    Jianhua Xuan
    BMC Bioinformatics, 15
  • [2] Novel Data Transformations for RNA-seq Differential Expression Analysis
    Zeyu Zhang
    Danyang Yu
    Minseok Seo
    Craig P. Hersh
    Scott T. Weiss
    Weiliang Qiu
    Scientific Reports, 9
  • [3] Novel Data Transformations for RNA-seq Differential Expression Analysis
    Zhang, Zeyu
    Yu, Danyang
    Seo, Minseok
    Hersh, Craig P.
    Weiss, Scott T.
    Qiu, Weiliang
    SCIENTIFIC REPORTS, 2019, 9 (1)
  • [4] Bayesian Hierarchical Model for Differential Gene Expression Using RNA-Seq Data
    Lee J.
    Ji Y.
    Liang S.
    Cai G.
    Müller P.
    Statistics in Biosciences, 2015, 7 (1) : 48 - 67
  • [5] Differential analysis of RNA-seq incorporating quantification uncertainty
    Pimentel, Harold
    Bray, Nicolas L.
    Puente, Suzette
    Melsted, Pall
    Pachter, Lior
    NATURE METHODS, 2017, 14 (07) : 687 - +
  • [6] Differential analysis of RNA-seq incorporating quantification uncertainty
    Harold Pimentel
    Nicolas L Bray
    Suzette Puente
    Páll Melsted
    Lior Pachter
    Nature Methods, 2017, 14 : 687 - 690
  • [7] Acfs: accurate circRNA identification and quantification from RNA-Seq data
    You, Xintian
    Conrad, Tim O. F.
    SCIENTIFIC REPORTS, 2016, 6
  • [8] WemIQ: an accurate and robust isoform quantification method for RNA-seq data
    Zhang, Jing
    Kuo, C. -C. Jay
    Chen, Liang
    BIOINFORMATICS, 2015, 31 (06) : 878 - 885
  • [9] Acfs: accurate circRNA identification and quantification from RNA-Seq data
    Xintian You
    Tim OF Conrad
    Scientific Reports, 6
  • [10] A Unified Model for Robust Differential Expression Analysis of RNA-Seq Data
    Liu, Kefei
    Shen, Li
    Jiang, Hui
    PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 437 - 442