DFseq: Distribution-Free Method to Detect Differential Gene Expression for RNA-Sequencing Data

被引:0
作者
Yang, Shengping [1 ]
Wachtel, Mitchell S. [1 ]
Wu, Jiangrong [2 ]
机构
[1] Texas Tech Univ, Dept Pathol, Hlth Sci Ctr, Lubbock, TX 79430 USA
[2] Univ Kentucky, Dept Biostat, Lexington, KY 40506 USA
关键词
Differential expression; RNA-seq; distribution-free; FALSE DISCOVERY RATE; NONPARAMETRIC APPROACH; NORMALIZATION; POWERFUL;
D O I
10.1109/TCBB.2018.2866994
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Many current RNA-sequencing data analysis methods compare expressions one gene at a time, taking little consideration of the correlations among genes. In this study, we propose a method to convert such an one-dimensional comparison approach into a two-dimensional evaluation of the ratio of standard deviations (SD) of two constructed random variables. This method allows the identification of differentially expressed genes while controlling a preset significance level conditional on the read count mean-variance relationship. Meanwhile, correlations among genes are naturally accommodated due to the clustering of genes with similar distribution in the proposed sigma - sigma plot. The proposed distribution-free method is designated as DFseq, because it does not depend on a parametric distribution to fit read count. As a result, compared with parametric methods, DFseq can effectively handle genes with a bimodal-like distribution and/or genes with excessive 0 read counts, as well as genes with outlying observations. Besides, DFseq is an ideal platform for comparing performance of different differential gene expression detection methods.
引用
收藏
页码:558 / 565
页数:8
相关论文
共 25 条
[1]   Differential expression analysis for sequence count data [J].
Anders, Simon ;
Huber, Wolfgang .
GENOME BIOLOGY, 2010, 11 (10)
[2]  
[Anonymous], MATH METHODS STAT
[3]   A Two-Stage Poisson Model for Testing RNA-Seq Data [J].
Auer, Paul L. ;
Doerge, Rebecca W. .
STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2011, 10 (01)
[4]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[5]   Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments [J].
Bullard, James H. ;
Purdom, Elizabeth ;
Hansen, Kasper D. ;
Dudoit, Sandrine .
BMC BIOINFORMATICS, 2010, 11
[6]   Bioconductor: open software development for computational biology and bioinformatics [J].
Gentleman, RC ;
Carey, VJ ;
Bates, DM ;
Bolstad, B ;
Dettling, M ;
Dudoit, S ;
Ellis, B ;
Gautier, L ;
Ge, YC ;
Gentry, J ;
Hornik, K ;
Hothorn, T ;
Huber, W ;
Iacus, S ;
Irizarry, R ;
Leisch, F ;
Li, C ;
Maechler, M ;
Rossini, AJ ;
Sawitzki, G ;
Smith, C ;
Smyth, G ;
Tierney, L ;
Yang, JYH ;
Zhang, JH .
GENOME BIOLOGY, 2004, 5 (10)
[7]   baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data [J].
Hardcastle, Thomas J. ;
Kelly, Krystyna A. .
BMC BIOINFORMATICS, 2010, 11
[8]   REGRESSION QUANTILES [J].
KOENKER, R ;
BASSETT, G .
ECONOMETRICA, 1978, 46 (01) :33-50
[9]   voom: precision weights unlock linear model analysis tools for RNA-seq read counts [J].
Law, Charity W. ;
Chen, Yunshun ;
Shi, Wei ;
Smyth, Gordon K. .
GENOME BIOLOGY, 2014, 15 (02)
[10]   Finding consistent patterns: A nonparametric approach for identifying differential expression in RNA-Seq data [J].
Li, Jun ;
Tibshirani, Robert .
STATISTICAL METHODS IN MEDICAL RESEARCH, 2013, 22 (05) :519-536