CEDER: Accurate Detection of Differentially Expressed Genes by Combining Significance of Exons Using RNA-Seq

被引:10
作者
Wan, Lin [1 ,2 ,3 ]
Sun, Fengzhu [1 ,4 ]
机构
[1] Univ So Calif, Mol & Computat Biol Program, Los Angeles, CA 90089 USA
[2] Chinese Acad Sci, Natl Ctr Math & Interdisciplinary Sci, Beijing 100190, Peoples R China
[3] Chinese Acad Sci, Key Lab Syst & Control, Acad Math & Syst Sci, Beijing 100190, Peoples R China
[4] Tsinghua Univ, TNLIST Dept Automat, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
RNA-Seq; gene expression; differentially expressed gene; high-throughput sequencing; combined p-value statistic; ISOFORM EXPRESSION; REPRODUCIBILITY; IMPROVE; PACKAGE;
D O I
10.1109/TCBB.2012.83
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
RNA-Seq is widely used in transcriptome studies, and the detection of differentially expressed genes (DEGs) between two classes of individuals, e. g., cases versus controls, using RNA-Seq is of fundamental importance. Many statistical methods for DEG detection based on RNA-Seq data have been developed and most of them are based on the read counts mapped to individual genes. On the other hand, genes are composed of exons and the distribution of reads for the different exons can be heterogeneous. We hypothesize that the detection accuracy of differentially expressed genes can be increased by analyzing individual exons within a gene and then combining the results of the exons. We therefore developed a novel program, termed CEDER, to accurately detect DEGs by combining the significance of the exons. CEDER first tests for differentially expressed exons yielding a p-value for each, and then gives a score indicating the potential for a gene to be differentially expressed by integrating the p-values of the exons in the gene. We showed that CEDER can significantly increase the accuracy of existing methods for detecting DEGs on two benchmark RNA-Seq data sets and simulated datasets.
引用
收藏
页码:1281 / 1292
页数:12
相关论文
共 32 条
[1]   Differential expression analysis for sequence count data [J].
Anders, Simon ;
Huber, Wolfgang .
GENOME BIOLOGY, 2010, 11 (10)
[2]   Statistical Design and Analysis of RNA Sequencing Data [J].
Auer, Paul L. ;
Doerge, R. W. .
GENETICS, 2010, 185 (02) :405-U32
[3]   Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments [J].
Bullard, James H. ;
Purdom, Elizabeth ;
Hansen, Kasper D. ;
Dudoit, Sandrine .
BMC BIOINFORMATICS, 2010, 11
[4]   Evaluation of DNA microarray results with quantitative gene expression platforms [J].
Canales, Roger D. ;
Luo, Yuling ;
Willey, James C. ;
Austermiller, Bradley ;
Barbacioru, Catalin C. ;
Boysen, Cecilie ;
Hunkapiller, Kathryn ;
Jensen, Roderick V. ;
Knight, Charles R. ;
Lee, Kathleen Y. ;
Ma, Yunqing ;
Maqsodi, Botoul ;
Papallo, Adam ;
Peters, Elizabeth Herness ;
Poulter, Karen ;
Ruppel, Patricia L. ;
Samaha, Raymond R. ;
Shi, Leming ;
Yang, Wen ;
Zhang, Lu ;
Goodsaid, Federico M. .
NATURE BIOTECHNOLOGY, 2006, 24 (09) :1115-1122
[5]   Analysis of multiple SNPs in a candidate gene or region [J].
Chapman, Juliet ;
Whittaker, John .
GENETIC EPIDEMIOLOGY, 2008, 32 (06) :560-566
[6]  
Fisher R. A., 1946, Statistical methods for research workers.
[7]  
FOLKS J.L., 1984, Handbook of Statistics, Vol, V4, P113, DOI DOI 10.1016/S0169-7161(84)04008-6
[8]   Advancing RNA-Seq analysis [J].
Haas, Brian J. ;
Zody, Michael C. .
NATURE BIOTECHNOLOGY, 2010, 28 (05) :421-423
[9]   Next-generation genomics: an integrative approach [J].
Hawkins, R. David ;
Hon, Gary C. ;
Ren, Bing .
NATURE REVIEWS GENETICS, 2010, 11 (07) :476-486
[10]  
Hedges L. V., 1985, STAT METHODS METAANA, DOI [10.1016/C2009-0-03396-0, DOI 10.1016/C2009-0-03396-0, 10.2307/1164953]