Differential expression analysis of RNA sequencing data by incorporating non-exonic mapped reads

被引:11
作者
Chen, Hung-I Harry [1 ,2 ]
Liu, Yuanhang [1 ,3 ]
Zou, Yi [1 ]
Lai, Zhao [1 ]
Sarkar, Devanand [5 ,6 ,7 ]
Huang, Yufei [2 ]
Chen, Yidong [1 ,4 ]
机构
[1] Univ Texas Hlth Sci Ctr San Antonio, Greehey Childrens Canc Res Inst, San Antonio, TX 78229 USA
[2] Univ Texas San Antonio, Dept Elect & Comp Engn, San Antonio, TX 78249 USA
[3] Univ Texas Hlth Sci Ctr San Antonio, Dept Cellular & Struct Biol, San Antonio, TX 78229 USA
[4] Univ Texas Hlth Sci Ctr San Antonio, Dept Epidemiol & Biostat, San Antonio, TX 78229 USA
[5] Virginia Commonwealth Univ, Dept Human, Richmond, VA 23284 USA
[6] Virginia Commonwealth Univ, Dept Mol Genet, Richmond, VA 23284 USA
[7] Virginia Commonwealth Univ, VCU Inst Mol Med VIMM, Richmond, VA 23284 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
TRANSCRIPTION; MODEL;
D O I
10.1186/1471-2164-16-S7-S14
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: RNA sequencing (RNA-seq) is a powerful tool for genome-wide expression profiling of biological samples with the advantage of high-throughput and high resolution. There are many existing algorithms nowadays for quantifying expression levels and detecting differential gene expression, but none of them takes the misaligned reads that are mapped to non-exonic regions into account. We developed a novel algorithm, XBSeq, where a statistical model was established based on the assumption that observed signals are the convolution of true expression signals and sequencing noises. The mapped reads in non-exonic regions are considered as sequencing noises, which follows a Poisson distribution. Given measureable observed and noise signals from RNA-seq data, true expression signals, assuming governed by the negative binomial distribution, can be delineated and thus the accurate detection of differential expressed genes. Results: We implemented our novel XBSeq algorithm and evaluated it by using a set of simulated expression datasets under different conditions, using a combination of negative binomial and Poisson distributions with parameters derived from real RNA-seq data. We compared the performance of our method with other commonly used differential expression analysis algorithms. We also evaluated the changes in true and false positive rates with variations in biological replicates, differential fold changes, and expression levels in non-exonic regions. We also tested the algorithm on a set of real RNA-seq data where the common and different detection results from different algorithms were reported. Conclusions: In this paper, we proposed a novel XBSeq, a differential expression analysis algorithm for RNA-seq data that takes non-exonic mapped reads into consideration. When background noise is at baseline level, the performance of XBSeq and DESeq are mostly equivalent. However, our method surpasses DESeq and other algorithms with the increase of non-exonic mapped reads. Only in very low read count condition XBSeq had a slightly higher false discovery rate, which may be improved by adjusting the background noise effect in this situation. Taken together, by considering non-exonic mapped reads, XBSeq can provide accurate expression measurement and thus detect differential expressed genes even in noisy conditions.
引用
收藏
页数:13
相关论文
共 20 条
[1]   Total RNA sequencing reveals nascent transcription and widespread co-transcriptional splicing in the human brain [J].
Ameur, Adam ;
Zaghlool, Ammar ;
Halvardson, Jonatan ;
Wetterbom, Anna ;
Gyllensten, Ulf ;
Cavelier, Lucia ;
Feuk, Lars .
NATURE STRUCTURAL & MOLECULAR BIOLOGY, 2011, 18 (12) :1435-U157
[2]   Differential expression analysis for sequence count data [J].
Anders, Simon ;
Huber, Wolfgang .
GENOME BIOLOGY, 2010, 11 (10)
[3]   HTSeq-a Python']Python framework to work with high-throughput sequencing data [J].
Anders, Simon ;
Pyl, Paul Theodor ;
Huber, Wolfgang .
BIOINFORMATICS, 2015, 31 (02) :166-169
[4]  
[Anonymous], 2004, Statistical Applications in Genetics and Molecular Biology, DOI [DOI 10.2202/1544-6115.1027, 10.2202/1544-6115.1027.Article3, 10.2202/1544-6115.1027.]
[5]   Landscape of transcription in human cells [J].
Djebali, Sarah ;
Davis, Carrie A. ;
Merkel, Angelika ;
Dobin, Alex ;
Lassmann, Timo ;
Mortazavi, Ali ;
Tanzer, Andrea ;
Lagarde, Julien ;
Lin, Wei ;
Schlesinger, Felix ;
Xue, Chenghai ;
Marinov, Georgi K. ;
Khatun, Jainab ;
Williams, Brian A. ;
Zaleski, Chris ;
Rozowsky, Joel ;
Roeder, Maik ;
Kokocinski, Felix ;
Abdelhamid, Rehab F. ;
Alioto, Tyler ;
Antoshechkin, Igor ;
Baer, Michael T. ;
Bar, Nadav S. ;
Batut, Philippe ;
Bell, Kimberly ;
Bell, Ian ;
Chakrabortty, Sudipto ;
Chen, Xian ;
Chrast, Jacqueline ;
Curado, Joao ;
Derrien, Thomas ;
Drenkow, Jorg ;
Dumais, Erica ;
Dumais, Jacqueline ;
Duttagupta, Radha ;
Falconnet, Emilie ;
Fastuca, Meagan ;
Fejes-Toth, Kata ;
Ferreira, Pedro ;
Foissac, Sylvain ;
Fullwood, Melissa J. ;
Gao, Hui ;
Gonzalez, David ;
Gordon, Assaf ;
Gunawardena, Harsha ;
Howald, Cedric ;
Jha, Sonali ;
Johnson, Rory ;
Kapranov, Philipp ;
King, Brandon .
NATURE, 2012, 489 (7414) :101-108
[6]   baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data [J].
Hardcastle, Thomas J. ;
Kelly, Krystyna A. .
BMC BIOINFORMATICS, 2010, 11
[7]   RNA sequencing reveals two major classes of gene expression levels in metazoan cells [J].
Hebenstreit, Daniel ;
Fang, Miaoqing ;
Gu, Muxin ;
Charoensawan, Varodom ;
van Oudenaarden, Alexander ;
Teichmann, Sarah A. .
MOLECULAR SYSTEMS BIOLOGY, 2011, 7
[8]  
Johnson NL, 2005, WILEY SER PROBAB ST, P1, DOI 10.1002/0471715816
[9]   voom: precision weights unlock linear model analysis tools for RNA-seq read counts [J].
Law, Charity W. ;
Chen, Yunshun ;
Shi, Wei ;
Smyth, Gordon K. .
GENOME BIOLOGY, 2014, 15 (02)
[10]   EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments [J].
Leng, Ning ;
Dawson, John A. ;
Thomson, James A. ;
Ruotti, Victor ;
Rissman, Anna I. ;
Smits, Bart M. G. ;
Haag, Jill D. ;
Gould, Michael N. ;
Stewart, Ron M. ;
Kendziorski, Christina .
BIOINFORMATICS, 2013, 29 (08) :1035-1043