An evaluation of RNA-seq differential analysis methods

被引:36
|
作者
Li, Dongmei [1 ]
Zand, Martin S. [1 ,2 ]
Dye, Timothy D. [3 ]
Goniewicz, Maciej L. [4 ]
Rahman, Irfan [5 ]
Xie, Zidian [1 ]
机构
[1] Univ Rochester, Sch Med & Dent, Clin & Translat Sci Inst, Rochester, NY USA
[2] Univ Rochester, Sch Med & Dent, Dept Med, Div Nephrol, Rochester, NY USA
[3] Univ Rochester, Sch Med & Dent, Dept Obstet & Gynecol, Rochester, NY USA
[4] Roswell Park Comprehens Canc Ctr, Dept Hlth Behav, Buffalo, NY USA
[5] Univ Rochester, Sch Med & Dent, Dept Environm Med, Rochester, NY USA
来源
PLOS ONE | 2022年 / 17卷 / 09期
基金
美国国家卫生研究院;
关键词
EXPRESSION ANALYSIS;
D O I
10.1371/journal.pone.0264246
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
RNA-seq is a high-throughput sequencing technology widely used for gene transcript discovery and quantification under different biological or biomedical conditions. A fundamental research question in most RNA-seq experiments is the identification of differentially expressed genes among experimental conditions or sample groups. Numerous statistical methods for RNA-seq differential analysis have been proposed since the emergence of the RNA-seq assay. To evaluate popular differential analysis methods used in the open source R and Bioconductor packages, we conducted multiple simulation studies to compare the performance of eight RNA-seq differential analysis methods used in RNA-seq data analysis (edgeR, DESeq, DESeq2, baySeq, EBSeq, NOISeq, SAMSeq, Voom). The comparisons were across different scenarios with either equal or unequal library sizes, different distribution assumptions and sample sizes. We measured performance using false discovery rate (FDR) control, power, and stability. No significant differences were observed for FDR control, power, or stability across methods, whether with equal or unequal library sizes. For RNA-seq count data with negative binomial distribution, when sample size is 3 in each group, EBSeq performed better than the other methods as indicated by FDR control, power, and stability. When sample sizes increase to 6 or 12 in each group, DESeq2 performed slightly better than other methods. All methods have improved performance when sample size increases to 12 in each group except DESeq. For RNA-seq count data with log-normal distribution, both DESeq and DESeq2 methods performed better than other methods in terms of FDR control, power, and stability across all sample sizes. Real RNA-seq experimental data were also used to compare the total number of discoveries and stability of discoveries for each method. For RNA-seq data analysis, the EBSeq method is recommended for studies with sample size as small as 3 in each group, and the DESeq2 method is recommended for sample size of 6 or higher in each group when the data follow the negative binomial distribution. Both DESeq and DESeq2 methods are recommended when the data follow the log-normal distribution.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data
    Franck Rapaport
    Raya Khanin
    Yupu Liang
    Mono Pirun
    Azra Krek
    Paul Zumbo
    Christopher E Mason
    Nicholas D Socci
    Doron Betel
    Genome Biology, 14
  • [2] Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data
    Rapaport, Franck
    Khanin, Raya
    Liang, Yupu
    Pirun, Mono
    Krek, Azra
    Zumbo, Paul
    Mason, Christopher E.
    Socci, Nicholas D.
    Betel, Doron
    GENOME BIOLOGY, 2013, 14 (09):
  • [3] Stability of methods for differential expression analysis of RNA-seq data
    Bingqing Lin
    Zhen Pang
    BMC Genomics, 20
  • [4] Stability of methods for differential expression analysis of RNA-seq data
    Lin, Bingqing
    Pang, Zhen
    BMC GENOMICS, 2019, 20 (1)
  • [5] A comparison of methods for differential expression analysis of RNA-seq data
    Soneson, Charlotte
    Delorenzi, Mauro
    BMC BIOINFORMATICS, 2013, 14
  • [6] A comparison of methods for differential expression analysis of RNA-seq data
    Charlotte Soneson
    Mauro Delorenzi
    BMC Bioinformatics, 14
  • [7] Erratum to: Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data
    Franck Rapaport
    Raya Khanin
    Yupu Liang
    Mono Pirun
    Azra Krek
    Paul Zumbo
    Christopher E. Mason
    Nicholas D. Socci
    Doron Betel
    Genome Biology, 16
  • [8] Development and evaluation of RNA-seq methods
    Levin, Joshua
    Adiconis, Xian
    Yassour, Moran
    Thompson, Dawn
    Guttman, Mitchell
    Berger, Michael
    Fan, Lin
    Friedman, Nir
    Nusbaum, Chad
    Gnirke, Andreas
    Regev, Aviv
    GENOME BIOLOGY, 2010, 11
  • [9] Development and evaluation of RNA-seq methods
    Joshua Levin
    Xian Adiconis
    Moran Yassour
    Dawn Thompson
    Mitchell Guttman
    Michael Berger
    Lin Fan
    Nir Friedman
    Chad Nusbaum
    Andreas Gnirke
    Aviv Regev
    Genome Biology, 11 (Suppl 1)
  • [10] RNA-Seq methods for transcriptome analysis
    Hrdlickova, Radmila
    Toloue, Masoud
    Tian, Bin
    WILEY INTERDISCIPLINARY REVIEWS-RNA, 2017, 8 (01)