Genome-guided transcript assembly by integrative analysis of RNA sequence data

被引:39
作者
Boley, Nathan [1 ]
Stoiber, Marcus H. [1 ]
Booth, Benjamin W. [2 ]
Wan, Kenneth H. [2 ]
Hoskins, Roger A. [2 ]
Bickel, Peter J. [3 ]
Celniker, Susan E. [2 ,3 ]
Brown, James B. [2 ,3 ]
机构
[1] Univ Calif Berkeley, Dept Biostat, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, Lawrence Berkeley Natl Lab, Dept Genome Dynam, Berkeley, CA 94720 USA
[3] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
关键词
SEQ DATA; MESSENGER-RNA; EXPRESSION; QUANTIFICATION; NORMALIZATION; LOCALIZATION; PROMOTER; REVEALS;
D O I
10.1038/nbt.2850
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
The identification of full length transcripts entirely from short-read RNA sequencing data (RNA-seq) remains a challenge in the annotation of genomes. Here we describe an automated pipeline for genome annotation that integrates RNA-seq and gene-boundary data sets, which we call Generalized RNA Integration Tool, or GRIT. Applying GRIT to Drosophila melanogaster short-read RNA-seq, cap analysis of gene expression (CAGE) and poly(A)-site-seq data collected for the modENCODE project, we recovered the vast majority of previously annotated transcripts and doubled the total number of transcripts cataloged. We found that 20% of protein coding genes encode multiple protein-localization signals and that, in 20-d-old adult fly heads, genes with multiple polyadenylation sites are more common than genes with alternative splicing or alternative promoters. GRIT demonstrates 30% higher precision and recall than the most widely used transcript assembly tools. GRIT will facilitate the automated generation of high-quality genome annotations without the need for extensive manual annotation.
引用
收藏
页码:341 / U198
页数:9
相关论文
共 41 条
  • [1] Bickel P. J., 2001, MATH STAT, V1, P394
  • [2] Diversity and dynamics of the Drosophila transcriptome
    Brown, James B.
    Boley, Nathan
    Eisman, Robert
    May, Gemma E.
    Stoiber, Marcus H.
    Duff, Michael O.
    Booth, Ben W.
    Wen, Jiayu
    Park, Soo
    Suzuki, Ana Maria
    Wan, Kenneth H.
    Yu, Charles
    Zhang, Dayu
    Carlson, Joseph W.
    Cherbas, Lucy
    Eads, Brian D.
    Miller, David
    Mockaitis, Keithanne
    Roberts, Johnny
    Davis, Carrie A.
    Frise, Erwin
    Hammonds, Ann S.
    Olson, Sara
    Shenker, Sol
    Sturgill, David
    Samsonova, Anastasia A.
    Weiszmann, Richard
    Robinson, Garret
    Hernandez, Juan
    Andrews, Justen
    Bickel, Peter J.
    Carninci, Piero
    Cherbas, Peter
    Gingeras, Thomas R.
    Hoskins, Roger A.
    Kaufman, Thomas C.
    Lai, Eric C.
    Oliver, Brian
    Perrimon, Norbert
    Graveley, Brenton R.
    Celniker, Susan E.
    [J]. NATURE, 2014, 512 (7515) : 393 - 399
  • [3] Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments
    Bullard, James H.
    Purdom, Elizabeth
    Hansen, Kasper D.
    Dudoit, Sandrine
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [4] The RNA polymerase II core promoter: a key component in the regulation of gene expression
    Butler, JEF
    Kadonaga, JT
    [J]. GENES & DEVELOPMENT, 2002, 16 (20) : 2583 - 2592
  • [5] Celotto AM, 2001, GENETICS, V159, P599
  • [6] Genome Analysis Reveals Interplay between 5′UTR Introns and Nuclear mRNA Export for Secretory and Mitochondrial Genes
    Cenik, Can
    Chua, Hon Nian
    Zhang, Hui
    Tarnawsky, Stefan P.
    Akef, Abdalla
    Derti, Adnan
    Tasan, Murat
    Moore, Melissa J.
    Palazzo, Alexander F.
    Roth, Frederick P.
    [J]. PLOS GENETICS, 2011, 7 (04):
  • [7] Incorporating RNA-seq data into the zebrafish Ensembl genebuild
    Collins, John E.
    White, Simon
    Searle, Stephen M. J.
    Stemple, Derek L.
    [J]. GENOME RESEARCH, 2012, 22 (10) : 2067 - 2078
  • [8] DNMT1-interacting RNAs block gene-specific DNA methylation
    Di Ruscio, Annalisa
    Ebralidze, Alexander K.
    Benoukraf, Touati
    Amabile, Giovanni
    Goff, Loyal A.
    Terragni, Jolyon
    Figueroa, Maria Eugenia
    Pontes, Lorena Lobo De Figueiredo
    Alberich-Jorda, Meritxell
    Zhang, Pu
    Wu, Mengchu
    D'Alo, Francesco
    Melnick, Ari
    Leone, Giuseppe
    Ebralidze, Konstantin K.
    Pradhan, Sriharsa
    Rinn, John L.
    Tenen, Daniel G.
    [J]. NATURE, 2013, 503 (7476) : 371 - +
  • [9] Predicting subcellular localization of proteins based on their N-terminal amino acid sequence
    Emanuelsson, O
    Nielsen, H
    Brunak, S
    von Heijne, G
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2000, 300 (04) : 1005 - 1016
  • [10] Gelbart WM, 1999, NUCLEIC ACIDS RES, V27, P85, DOI 10.1093/nar/27.1.85