A long-read and short-read transcriptomics approach provides the first high-quality reference transcriptome and genome annotation for Pseudotsuga menziesii (Douglas-fir)

被引:0
作者
Velasco, Vera Marjorie Elauria [1 ,2 ]
Ferreira, Alyssa [3 ]
Zaman, Sumaira [3 ]
Noordermeer, Devin [1 ,4 ]
Ensminger, Ingo [1 ,4 ]
Wegrzyn, Jill L. [3 ]
机构
[1] Univ Toronto, Dept Biol, Mississauga, ON L5L 1C8, Canada
[2] Univ Toronto, Off Vice President, Res, Mississauga, ON L5L 1C8, Canada
[3] Univ Connecticut, Dept Evolut & Ecol, Storrs, CT 06269 USA
[4] Univ Toronto, Grad Dept Cell & Syst Biol, Toronto, ON M5S, Canada
来源
G3-GENES GENOMES GENETICS | 2023年 / 13卷 / 02期
关键词
coastal Douglas-fir; de novo assembly; full-length isoform; functional annotation; genome annotation; interior Douglas-fir; long noncoding RNA; NovaSeq; PacBio Iso-Seq; Pseudotsuga menziesii var; glauca; menziesii; reference transcriptome; transcription factors; MESSENGER-RNA; SEQUENCE; ALIGNMENT; TOOL; PREDICTION; UTILITIES; EXPANSION; PROTEINS; DATABASE; GMAP;
D O I
10.1093/g3journal/jkac304
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Douglas-fir (Pseudotsuga menziesii) is native to western North America. It grows in a wide range of environmental conditions and is an important timber tree. Although there are several studies on the gene expression responses of Douglas-fir to abiotic cues, the absence of high-quality transcriptome and genome data is a barrier to further investigation. Like for most conifers, the available transcriptome and genome reference dataset for Douglas-fir remains fragmented and requires refinement. We aimed to generate a highly accurate, and complete reference transcriptome and genome annotation. We deep-sequenced the transcriptome of Douglas-fir needles from seedlings that were grown under nonstress control conditions or a combination of heat and drought stress conditions using long-read (LR) and short-read (SR) sequencing platforms. We used 2 computational approaches, namely de novo and genome-guided LR transcriptome assembly. Using the LR de novo assembly, we identified 1.3X more high-quality transcripts, 1.85X more "complete" genes, and 2.7X more functionally annotated genes compared to the genome-guided assembly approach. We predicted 666 long noncoding RNAs and 12,778 unique protein-coding transcripts including 2,016 putative transcription factors. We leveraged the LR de novo assembled transcriptome with paired-end SR and a published single-end SR transcriptome to generate an improved genome annotation. This was conducted with BRAKER2 and refined based on functional annotation, repetitive content, and transcriptome alignment. This high-quality genome annotation has 51,419 unique gene models derived from 322,631 initial predictions. Overall, our informatics approach provides a new reference Douglas-fir transcriptome assembly and genome annotation with considerably improved completeness and functional annotation.
引用
收藏
页数:13
相关论文
共 70 条
  • [1] Single molecule real-time (SMRT) sequencing comes of age: applications and utilities for medical diagnostics
    Ardui, Simon
    Ameur, Adam
    Vermeesch, Joris R.
    Hestand, Matthew S.
    [J]. NUCLEIC ACIDS RESEARCH, 2018, 46 (05) : 2159 - 2168
  • [2] Bayega A, 2018, METHODS MOL BIOL, V1783, P209, DOI 10.1007/978-1-4939-7834-2_11
  • [3] Subgroup 4 R2R3-MYBs in conifer trees: gene family expansion and contribution to the isoprenoid- and flavonoid-oriented responses
    Bedon, Frank
    Bomal, Claude
    Caron, Sebastien
    Levasseur, Caroline
    Boyle, Brian
    Mansfield, Shawn D.
    Schmidt, Axel
    Gershenzon, Jonathan
    Grima-Pettenati, Jacqueline
    Seguin, Armand
    MacKay, John
    [J]. JOURNAL OF EXPERIMENTAL BOTANY, 2010, 61 (14) : 3847 - 3864
  • [4] Roles of Tetratricopeptide Repeat Proteins in Biogenesis of the Photosynthetic Apparatus
    Bohne, A. -V.
    Schwenkert, S.
    Grimm, B.
    Nickelsen, J.
    [J]. INTERNATIONAL REVIEW OF CELL AND MOLECULAR BIOLOGY, VOL 324, 2016, 324 : 187 - 227
  • [5] Trimmomatic: a flexible trimmer for Illumina sequence data
    Bolger, Anthony M.
    Lohse, Marc
    Usadel, Bjoern
    [J]. BIOINFORMATICS, 2014, 30 (15) : 2114 - 2120
  • [6] Long Non-coding RNA in Plants in the Era of Reference Sequences
    Budak, Hikmet
    Kaya, Sezgi Biyiklioglu
    Cagirici, Halise Busra
    [J]. FRONTIERS IN PLANT SCIENCE, 2020, 11
  • [7] rnaQUAST: a quality assessment tool for de novo transcriptome assemblies
    Bushmanova, Elena
    Antipov, Dmitry
    Lapidus, Alla
    Suvorov, Vladimir
    Prjibelski, Andrey D.
    [J]. BIOINFORMATICS, 2016, 32 (14) : 2210 - 2212
  • [8] Realizing the potential of full-length transcriptome sequencing
    Byrne, Ashley
    Cole, Charles
    Volden, Roger
    Vollmers, Christopher
    [J]. PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2019, 374 (1786)
  • [9] gFACs: Gene Filtering, Analysis, and Conversion to Unify Genome Annotations Across Alignment and Gene Prediction Frameworks
    Caballero, Madison
    Wegrzyn, Jill
    [J]. GENOMICS PROTEOMICS & BIOINFORMATICS, 2019, 17 (03) : 305 - 310
  • [10] IsoSeq transcriptome assembly of C3 panicoid grasses provides tools to study evolutionary change in the Panicoideae
    Carvalho, Daniel S.
    Nishimwe, Aime V.
    Schnable, James C.
    [J]. PLANT DIRECT, 2020, 4 (02)