A simple guide to de novo transcriptome assembly and annotation

被引:77
作者
Raghavan, Venket [1 ]
Kraft, Louis [1 ]
Mesny, Fantin [1 ]
Rigerte, Linda [1 ]
机构
[1] Max Planck Inst Biophys Chem, Quantitat & Computat Biol, D-37077 Gottingen, Germany
关键词
de novo; transcriptome; assembly; annotation; tools; RNA-seq; RNA-SEQ DATA; DIFFERENTIAL EXPRESSION; FUNCTIONAL ANNOTATION; GENE-EXPRESSION; QUALITY ASSESSMENT; READ ALIGNMENT; CD-HIT; SEQUENCE; PROTEIN; GENOME;
D O I
10.1093/bib/bbab563
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
A transcriptome constructed from short-read RNA sequencing (RNA-seq) is an easily attainable proxy catalog of protein-coding genes when genome assembly is unnecessary, expensive or difficult. In the absence of a sequenced genome to guide the reconstruction process, the transcriptome must be assembled de novo using only the information available in the RNA-seq reads. Subsequently, the sequences must be annotated in order to identify sequence-intrinsic and evolutionary features in them (for example, protein-coding regions). Although straightforward at first glance, de novo transcriptome assembly and annotation can quickly prove to be challenging undertakings. In addition to familiarizing themselves with the conceptual and technical intricacies of the tasks at hand and the numerous pre- and post-processing steps involved, those interested must also grapple with an overwhelmingly large choice of tools. The lack of standardized workflows, fast pace of development of new tools and techniques and paucity of authoritative literature have served to exacerbate the difficulty of the task even further. Here, we present a comprehensive overview of de novo transcriptome assembly and annotation. We discuss the procedures involved, including pre- and post-processing steps, and present a compendium of corresponding tools.
引用
收藏
页数:30
相关论文
共 257 条
[1]  
Acland A, 2013, NUCLEIC ACIDS RES, V41, pD8, DOI [10.1093/nar/gks1189, 10.1093/nar/gkx1095, 10.1093/nar/gkq1172]
[2]   The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update [J].
Afgan, Enis ;
Baker, Dannon ;
Batut, Berenice ;
van den Beek, Marius ;
Bouvier, Dave ;
Cech, Martin ;
Chilton, John ;
Clements, Dave ;
Coraor, Nate ;
Gruening, Bjoern A. ;
Guerler, Aysam ;
Hillman-Jackson, Jennifer ;
Hiltemann, Saskia ;
Jalili, Vahid ;
Rasche, Helena ;
Soranzo, Nicola ;
Goecks, Jeremy ;
Taylor, James ;
Nekrutenko, Anton ;
Blankenberg, Daniel .
NUCLEIC ACIDS RESEARCH, 2018, 46 (W1) :W537-W544
[3]   OMA orthology in 2021: website overhaul, conserved isoforms, ancestral gene order and more [J].
Altenhoff, Adrian M. ;
Train, Clement-Marie ;
Gilbert, Kimberly J. ;
Mediratta, Ishita ;
de Farias, Tarcisio Mendes ;
Moi, David ;
Nevers, Yannis ;
Radoykova, Hale-Seda ;
Rossier, Victor ;
Vesztrocy, Alex Warwick ;
Glover, Natasha M. ;
Dessimoz, Christophe .
NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) :D373-D379
[4]  
Altenhoff AM, 2019, METHODS MOL BIOL, V1910, P149, DOI 10.1007/978-1-4939-9074-0_5
[5]   OMA standalone: orthology inference among public and custom genomes and transcriptomes [J].
Altenhoff, Adrian M. ;
Levy, Jeremy ;
Zarowiecki, Magdalena ;
Tomiczek, Bartlomiej ;
Vesztrocy, Alex Warwick ;
Dalquen, Daniel A. ;
Mueller, Steven ;
Telford, Maximilian J. ;
Glover, Natasha M. ;
Dylus, David ;
Dessimoz, Christophe .
GENOME RESEARCH, 2019, 29 (07) :1152-1163
[6]   Resolving the Ortholog Conjecture: Orthologs Tend to Be Weakly, but Significantly, More Similar in Function than Paralogs [J].
Altenhoff, Adrian M. ;
Studer, Romain A. ;
Robinson-Rechavi, Marc ;
Dessimoz, Christophe .
PLOS COMPUTATIONAL BIOLOGY, 2012, 8 (05)
[7]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[8]   TPMCalculator: one-step software to quantify mRNA abundance of genomic features [J].
Alvarez, Roberto Vera ;
Pongor, Lorinc Sandor ;
Marino-Ramirez, Leonardo ;
Landsman, David .
BIOINFORMATICS, 2019, 35 (11) :1960-1962
[9]  
Alvarez RV, 2021, GIGASCIENCE, V10, pgiaa163
[10]   Non-coding RNAs in homeostasis, disease and stress responses: an evolutionary perspective [J].
Amaral, Paulo P. ;
Dinger, Marcel E. ;
Mattick, John S. .
BRIEFINGS IN FUNCTIONAL GENOMICS, 2013, 12 (03) :254-278