Genome-guided transcript assembly by integrative analysis of RNA sequence data

被引:39
作者
Boley, Nathan [1 ]
Stoiber, Marcus H. [1 ]
Booth, Benjamin W. [2 ]
Wan, Kenneth H. [2 ]
Hoskins, Roger A. [2 ]
Bickel, Peter J. [3 ]
Celniker, Susan E. [2 ,3 ]
Brown, James B. [2 ,3 ]
机构
[1] Univ Calif Berkeley, Dept Biostat, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, Lawrence Berkeley Natl Lab, Dept Genome Dynam, Berkeley, CA 94720 USA
[3] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
关键词
SEQ DATA; MESSENGER-RNA; EXPRESSION; QUANTIFICATION; NORMALIZATION; LOCALIZATION; PROMOTER; REVEALS;
D O I
10.1038/nbt.2850
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
The identification of full length transcripts entirely from short-read RNA sequencing data (RNA-seq) remains a challenge in the annotation of genomes. Here we describe an automated pipeline for genome annotation that integrates RNA-seq and gene-boundary data sets, which we call Generalized RNA Integration Tool, or GRIT. Applying GRIT to Drosophila melanogaster short-read RNA-seq, cap analysis of gene expression (CAGE) and poly(A)-site-seq data collected for the modENCODE project, we recovered the vast majority of previously annotated transcripts and doubled the total number of transcripts cataloged. We found that 20% of protein coding genes encode multiple protein-localization signals and that, in 20-d-old adult fly heads, genes with multiple polyadenylation sites are more common than genes with alternative splicing or alternative promoters. GRIT demonstrates 30% higher precision and recall than the most widely used transcript assembly tools. GRIT will facilitate the automated generation of high-quality genome annotations without the need for extensive manual annotation.
引用
收藏
页码:341 / U198
页数:9
相关论文
共 41 条
[1]  
Bickel P. J., 2001, MATH STAT, V1, P394
[2]   Diversity and dynamics of the Drosophila transcriptome [J].
Brown, James B. ;
Boley, Nathan ;
Eisman, Robert ;
May, Gemma E. ;
Stoiber, Marcus H. ;
Duff, Michael O. ;
Booth, Ben W. ;
Wen, Jiayu ;
Park, Soo ;
Suzuki, Ana Maria ;
Wan, Kenneth H. ;
Yu, Charles ;
Zhang, Dayu ;
Carlson, Joseph W. ;
Cherbas, Lucy ;
Eads, Brian D. ;
Miller, David ;
Mockaitis, Keithanne ;
Roberts, Johnny ;
Davis, Carrie A. ;
Frise, Erwin ;
Hammonds, Ann S. ;
Olson, Sara ;
Shenker, Sol ;
Sturgill, David ;
Samsonova, Anastasia A. ;
Weiszmann, Richard ;
Robinson, Garret ;
Hernandez, Juan ;
Andrews, Justen ;
Bickel, Peter J. ;
Carninci, Piero ;
Cherbas, Peter ;
Gingeras, Thomas R. ;
Hoskins, Roger A. ;
Kaufman, Thomas C. ;
Lai, Eric C. ;
Oliver, Brian ;
Perrimon, Norbert ;
Graveley, Brenton R. ;
Celniker, Susan E. .
NATURE, 2014, 512 (7515) :393-399
[3]   Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments [J].
Bullard, James H. ;
Purdom, Elizabeth ;
Hansen, Kasper D. ;
Dudoit, Sandrine .
BMC BIOINFORMATICS, 2010, 11
[4]   The RNA polymerase II core promoter: a key component in the regulation of gene expression [J].
Butler, JEF ;
Kadonaga, JT .
GENES & DEVELOPMENT, 2002, 16 (20) :2583-2592
[5]  
Celotto AM, 2001, GENETICS, V159, P599
[6]   Genome Analysis Reveals Interplay between 5′UTR Introns and Nuclear mRNA Export for Secretory and Mitochondrial Genes [J].
Cenik, Can ;
Chua, Hon Nian ;
Zhang, Hui ;
Tarnawsky, Stefan P. ;
Akef, Abdalla ;
Derti, Adnan ;
Tasan, Murat ;
Moore, Melissa J. ;
Palazzo, Alexander F. ;
Roth, Frederick P. .
PLOS GENETICS, 2011, 7 (04)
[7]   Incorporating RNA-seq data into the zebrafish Ensembl genebuild [J].
Collins, John E. ;
White, Simon ;
Searle, Stephen M. J. ;
Stemple, Derek L. .
GENOME RESEARCH, 2012, 22 (10) :2067-2078
[8]   DNMT1-interacting RNAs block gene-specific DNA methylation [J].
Di Ruscio, Annalisa ;
Ebralidze, Alexander K. ;
Benoukraf, Touati ;
Amabile, Giovanni ;
Goff, Loyal A. ;
Terragni, Jolyon ;
Figueroa, Maria Eugenia ;
Pontes, Lorena Lobo De Figueiredo ;
Alberich-Jorda, Meritxell ;
Zhang, Pu ;
Wu, Mengchu ;
D'Alo, Francesco ;
Melnick, Ari ;
Leone, Giuseppe ;
Ebralidze, Konstantin K. ;
Pradhan, Sriharsa ;
Rinn, John L. ;
Tenen, Daniel G. .
NATURE, 2013, 503 (7476) :371-+
[9]   Predicting subcellular localization of proteins based on their N-terminal amino acid sequence [J].
Emanuelsson, O ;
Nielsen, H ;
Brunak, S ;
von Heijne, G .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 300 (04) :1005-1016
[10]  
Gelbart WM, 1999, NUCLEIC ACIDS RES, V27, P85, DOI 10.1093/nar/27.1.85