The Impact of cDNA Normalization on Long-Read Sequencing of a Complex Transcriptome

被引:6
|
作者
Hoang, Nam, V [1 ]
Furtado, Agnelo [2 ]
Perlo, Virginie [2 ]
Botha, Frederik C. [2 ,3 ]
Henry, Robert J. [2 ]
机构
[1] Hue Univ, Coll Agr & Forestry, Hue, Vietnam
[2] Univ Queensland, Queensland Alliance Agr & Food Innovat, St Lucia, Qld, Australia
[3] Sugar Res Australia, Indooroopilly, Qld, Australia
关键词
isoform sequencing; transcriptome normalization; transcript enrichment; normalization impact; sugarcane transcriptome; polyploid transcriptome; DUPLEX-SPECIFIC NUCLEASE; MESSENGER-RNA; NONCODING RNAS; WEB SERVER; GENOME; SUGARCANE; IDENTIFICATION; ANNOTATION; SACCHARUM; PATHWAYS;
D O I
10.3389/fgene.2019.00654
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Normalization of cDNA is widely used to improve the coverage of rare transcripts in analysis of transcriptomes employing next-generation sequencing. Recently, long-read technology has been emerging as a powerful tool for sequencing and construction of transcriptomes, especially for complex genomes containing highly similar transcripts and transcript-spliced isoforms. Here, we analyzed the transcriptome of sugarcane, a highly polyploidy plant genome, by PacBio isoform sequencing (Iso-Seq) of two different cDNA library preparations, with and without a normalization step. The results demonstrated that, while the two libraries included many of the same transcripts, many longer transcripts were removed, and many new generally shorter transcripts were detected by normalization. For the same input cDNA and data yield, the normalized library recovered more total transcript isoforms and number of predicted gene families and orthologous groups, resulting in a higher representation for the sugarcane transcriptome, compared to the non-normalized library. The non-normalized library, on the other hand, included a wider transcript length range with more longer transcripts above similar to 1.25 kb and more transcript isoforms per gene family and gene ontology terms per transcript. A large proportion of the unique transcripts comprising similar to 52% of the normalized library were expressed at a lower level than the unique transcripts from the non-normalized library, across three tissue types tested including leaf, stalk, and root. About 83% of the total 5,348 predicted long noncoding transcripts was derived from the normalized library, of which similar to 80% was derived from the lowly expressed fraction. Functional annotation of the unique transcripts suggested that each library enriched different functional transcript fractions. This demonstrated the complementation of the two approaches in obtaining a complete transcriptome of a complex genome at the sequencing depth used in this study.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Long-read cDNA sequencing identifies functional pseudogenes in the human transcriptome
    Robin-Lee Troskie
    Yohaann Jafrani
    Tim R. Mercer
    Adam D. Ewing
    Geoffrey J. Faulkner
    Seth W. Cheetham
    Genome Biology, 22
  • [2] Long-read cDNA sequencing identifies functional pseudogenes in the human transcriptome
    Troskie, Robin-Lee
    Jafrani, Yohaann
    Mercer, Tim R.
    Ewing, Adam D.
    Faulkner, Geoffrey J.
    Cheetham, Seth W.
    GENOME BIOLOGY, 2021, 22 (01)
  • [3] Long-Read Annotation: Automated Eukaryotic Genome Annotation Based on Long-Read cDNA Sequencing
    Cook, David E.
    Valle-Inclan, Jose Espejo
    Pajoro, Alice
    Rovenich, Hanna
    Thomma, Bart P. H. J.
    Faino, Luigi
    PLANT PHYSIOLOGY, 2019, 179 (01) : 38 - 54
  • [4] Long-read sequencing uncovers a complex transcriptome topology in varicella zoster virus
    Prazsak, Istvan
    Moldovan, Norbert
    Balazs, Zsolt
    Tombacz, Dora
    Megyeri, Klara
    Szucs, Attila
    Csabai, Zsolt
    Boldogkoi, Zsolt
    BMC GENOMICS, 2018, 19
  • [5] Long-read sequencing uncovers a complex transcriptome topology in varicella zoster virus
    István Prazsák
    Norbert Moldován
    Zsolt Balázs
    Dóra Tombácz
    Klára Megyeri
    Attila Szűcs
    Zsolt Csabai
    Zsolt Boldogkői
    BMC Genomics, 19
  • [6] JAFFAL: detecting fusion genes with long-read transcriptome sequencing
    Davidson, Nadia M.
    Chen, Ying
    Sadras, Teresa
    Ryland, Georgina L.
    Blombery, Piers
    Ekert, Paul G.
    Goke, Jonathan
    Oshlack, Alicia
    GENOME BIOLOGY, 2022, 23 (01)
  • [7] JAFFAL: detecting fusion genes with long-read transcriptome sequencing
    Nadia M. Davidson
    Ying Chen
    Teresa Sadras
    Georgina L. Ryland
    Piers Blombery
    Paul G. Ekert
    Jonathan Göke
    Alicia Oshlack
    Genome Biology, 23
  • [8] Transcriptome variation in human tissues revealed by long-read sequencing
    Glinos, Dafni A.
    Garborcauskas, Garrett
    Hoffman, Paul
    Ehsan, Nava
    Jiang, Lihua
    Gokden, Alper
    Dai, Xiaoguang
    Aguet, Francois
    Brown, Kathleen L.
    Garimella, Kiran
    Bowers, Tera
    Costello, Maura
    Ardlie, Kristin
    Jian, Ruiqi
    Tucker, Nathan R.
    Ellinor, Patrick T.
    Harrington, Eoghan D.
    Tang, Hua
    Snyder, Michael
    Juul, Sissel
    Mohammadi, Pejman
    MacArthur, Daniel G.
    Lappalainen, Tuuli
    Cummings, Beryl
    NATURE, 2022, 608 (7922) : 353 - +
  • [9] IsoTools: a flexible workflow for long-read transcriptome sequencing analysis
    Lienhard, Matthias
    van den Beucken, Twan
    Timmermann, Bernd
    Hochradel, Myriam
    Boerno, Stefan
    Caiment, Florian
    Vingron, Martin
    Herwig, Ralf
    BIOINFORMATICS, 2023, 39 (06)
  • [10] Long-Read Sequencing - A Powerful Toll in Viral Transcriptome Research
    Boldogkoi, Zsolt
    Moldovan, Norbert
    Balazs, Zsolt
    Snyder, Michael
    Tombacz, Ddra
    TRENDS IN MICROBIOLOGY, 2019, 27 (07) : 578 - 592