BNP-Seq: Bayesian Nonparametric Differential Expression Analysis of Sequencing Count Data

被引:20
作者
Dadaneh, Siamak Zamani [1 ]
Qian, Xiaoning [1 ,2 ,3 ]
Zhou, Mingyuan [2 ,3 ]
机构
[1] Texas A&M Univ, Ctr Bioinformat & Genom Syst Engn, Dept Elect & Comp Engn, College Stn, TX USA
[2] Univ Texas Austin, Dept Informat Risk & Operat Management, Austin, TX 78712 USA
[3] Univ Texas Austin, Dept Stat & Data Sci, Austin, TX 78712 USA
基金
美国国家科学基金会;
关键词
Markov chain Monte Carlo; Negative binomial processes; Over-dispersion; RNA-Seq; Symmetric Kullback-Leibler divergence; RENAL-CELL CARCINOMA; RNA-SEQ; TUMOR; GENES; PROLIFERATION; NORMALIZATION; BIOCONDUCTOR; PATTERNS; PACKAGE; MODELS;
D O I
10.1080/01621459.2017.1328358
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We perform differential expression analysis of high-throughput sequencing count data under a Bayesian nonparametric framework, removing sophisticated ad hoc pre-processing steps commonly required in existing algorithms. We propose to use the gamma (beta) negative binomial process, which takes into account different sequencing depths using sample-specific negative binomial probability (dispersion) parameters, to detect differentially expressed genes by comparing the posterior distributions of gene-specific negative binomial dispersion (probability) parameters. These model parameters are inferred by borrowing statistical strength across both the genes and samples. Extensive experiments on both simulated and real-world RNA sequencing count data show that the proposed differential expression analysis algorithms clearly outperform previously proposed ones in terms of the areas under both the receiver operating characteristic and precision-recall curves. Supplementary materials for this article are available online.
引用
收藏
页码:81 / 94
页数:14
相关论文
共 66 条
[1]   HTSeq-a Python']Python framework to work with high-throughput sequencing data [J].
Anders, Simon ;
Pyl, Paul Theodor ;
Huber, Wolfgang .
BIOINFORMATICS, 2015, 31 (02) :166-169
[2]  
[Anonymous], 2012, Artificial Intelligence and Statistics
[3]  
[Anonymous], BIOSTATISTICS
[4]  
[Anonymous], 2010, GENOME BIOL
[5]   High-grade clear cell renal cell carcinoma has a higher angiogenic activity than low-grade renal cell carcinoma based on histomorphological quantification and qRT-PCR mRNA expression profile [J].
Baldewijns, M. M. ;
Thijssen, V. L. ;
Van den Eynden, G. G. ;
Van Laere, S. J. ;
Bluekens, A. M. ;
Roskams, T. ;
van Poppel, H. ;
De Bruine, A. P. ;
Griffioen, A. W. ;
Vermeulen, P. B. .
BRITISH JOURNAL OF CANCER, 2007, 96 (12) :1888-1895
[6]   FITTING THE NEGATIVE BINOMIAL DISTRIBUTION TO BIOLOGICAL DATA - NOTE ON THE EFFICIENT FITTING OF THE NEGATIVE BINOMIAL [J].
BLISS, CI ;
FISHER, RA .
BIOMETRICS, 1953, 9 (02) :176-200
[7]   Combinatorial Clustering and the Beta Negative Binomial Process [J].
Broderick, Tamara ;
Mackey, Lester ;
Paisley, John ;
Jordan, Michael I. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (02) :290-306
[8]  
Bui MHT, 2003, CLIN CANCER RES, V9, P802
[9]   Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments [J].
Bullard, James H. ;
Purdom, Elizabeth ;
Hansen, Kasper D. ;
Dudoit, Sandrine .
BMC BIOINFORMATICS, 2010, 11
[10]   Plasma angiopoietin-2 (ANG2) as an angiogenic biomarker in renal cell carcinoma (RCC) [J].
Bullock, A. J. ;
Zhang, L. ;
O'Neill, A. M. ;
Percy, A. ;
Sukhatme, V. ;
Mier, J. W. ;
Atkins, M. B. ;
Bhatt, R. S. .
JOURNAL OF CLINICAL ONCOLOGY, 2010, 28 (15)