TCC: an R package for comparing tag count data with robust normalization strategies

被引:425
作者
Sun, Jianqiang [1 ]
Nishiyama, Tomoaki [2 ]
Shimizu, Kentaro [1 ]
Kadota, Koji [1 ]
机构
[1] Univ Tokyo, Grad Sch Agr & Life Sci, Bunkyo Ku, Tokyo 1138657, Japan
[2] Kanazawa Univ, Adv Sci Res Ctr, Kanazawa, Ishikawa 9200934, Japan
来源
BMC BIOINFORMATICS | 2013年 / 14卷
关键词
DIFFERENTIAL EXPRESSION ANALYSIS; RNA-SEQ; GENES; REPRODUCIBILITY; BIOCONDUCTOR;
D O I
10.1186/1471-2105-14-219
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Differential expression analysis based on "next-generation" sequencing technologies is a fundamental means of studying RNA expression. We recently developed a multi-step normalization method (called TbT) for two-group RNA-seq data with replicates and demonstrated that the statistical methods available in four R packages (edgeR, DESeq, baySeq, and NBPSeq) together with TbT can produce a well-ranked gene list in which true differentially expressed genes (DEGs) are top-ranked and non-DEGs are bottom ranked. However, the advantages of the current TbT method come at the cost of a huge computation time. Moreover, the R packages did not have normalization methods based on such a multi-step strategy. Results: TCC (an acronym for Tag Count Comparison) is an R package that provides a series of functions for differential expression analysis of tag count data. The package incorporates multi-step normalization methods, whose strategy is to remove potential DEGs before performing the data normalization. The normalization function based on this DEG elimination strategy (DEGES) includes (i) the original TbT method based on DEGES for two-group data with or without replicates, (ii) much faster methods for two-group data with or without replicates, and (iii) methods for multi-group comparison. TCC provides a simple unified interface to perform such analyses with combinations of functions provided by edgeR, DESeq, and baySeq. Additionally, a function for generating simulation data under various conditions and alternative DEGES procedures consisting of functions in the existing packages are provided. Bioinformatics scientists can use TCC to evaluate their methods, and biologists familiar with other R packages can easily learn what is done in TCC. Conclusion: DEGES in TCC is essential for accurate normalization of tag count data, especially when up-and down-regulated DEGs in one of the samples are extremely biased in their number. TCC is useful for analyzing tag count data in various scenarios ranging from unbiased to extremely biased differential expression.
引用
收藏
页数:13
相关论文
共 28 条
  • [1] Differential expression analysis for sequence count data
    Anders, Simon
    Huber, Wolfgang
    [J]. GENOME BIOLOGY, 2010, 11 (10):
  • [2] Detecting differential usage of exons from RNA-seq data
    Anders, Simon
    Reyes, Alejandro
    Huber, Wolfgang
    [J]. GENOME RESEARCH, 2012, 22 (10) : 2008 - 2017
  • [3] 3′ tag digital gene expression profiling of human brain and universal reference RNA using Illumina Genome Analyzer
    Asmann, Yan W.
    Klee, Eric W.
    Thompson, E. Aubrey
    Perez, Edith A.
    Middha, Sumit
    Oberg, Ann L.
    Therneau, Terry M.
    Smith, David I.
    Poland, Gregory A.
    Wieben, Eric D.
    Kocher, Jean-Pierre A.
    [J]. BMC GENOMICS, 2009, 10 : 531
  • [4] Sex-specific and lineage-specific alternative splicing in primates
    Blekhman, Ran
    Marioni, John C.
    Zumbo, Paul
    Stephens, Matthew
    Gilad, Yoav
    [J]. GENOME RESEARCH, 2010, 20 (02) : 180 - 189
  • [5] A comparison of normalization methods for high density oligonucleotide array data based on variance and bias
    Bolstad, BM
    Irizarry, RA
    Åstrand, M
    Speed, TP
    [J]. BIOINFORMATICS, 2003, 19 (02) : 185 - 193
  • [6] Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments
    Bullard, James H.
    Purdom, Elizabeth
    Hansen, Kasper D.
    Dudoit, Sandrine
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [7] The NBP Negative Binomial Model for Assessing Differential Gene Expression from RNA-Seq
    Di, Yanming
    Schafer, Daniel W.
    Cumbie, Jason S.
    Chang, Jeff H.
    [J]. STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2011, 10 (01)
  • [8] Dillies MA, 2012, BRIEF BIOINFORM, DOI [10.1093/bib/bbs046, DOI 10.1093/BIB/BBS046]
  • [9] ReCount: A multi-experiment resource of analysis-ready RNA-seq gene count datasets
    Frazee, Alyssa C.
    Langmead, Ben
    Leek, Jeffrey T.
    [J]. BMC BIOINFORMATICS, 2011, 12
  • [10] Evaluation of normalization methods in mammalian microRNA-Seq data
    Garmire, Lana Xia
    Subramaniam, Shankar
    [J]. RNA, 2012, 18 (06) : 1279 - 1288