UNAGI: an automated pipeline for nanopore full-length cDNA sequencing uncovers novel transcripts and isoforms in yeast

被引:3
作者
Al Kadi, Mohamad [1 ]
Jung, Nicolas [2 ]
Ito, Shingo [2 ]
Kameoka, Shoichiro [2 ,3 ]
Hishida, Takashi [4 ]
Motooka, Daisuke [2 ,5 ]
Nakamura, Shota [2 ,5 ,6 ]
Iida, Tetsuya [1 ,2 ]
Okuzaki, Daisuke [5 ,6 ,7 ]
机构
[1] Osaka Univ, Res Inst Microbial Dis, Dept Bacterial Infect, Suita, Osaka 5650871, Japan
[2] Osaka Univ, Res Inst Microbial Dis, Dept Infect Metagen, Suita, Osaka 5650871, Japan
[3] Cykinso Inc, Tokyo 1510053, Japan
[4] Gakushuin Univ, Dept Mol Biol, Grad Sch Sci, Tokyo 1710031, Japan
[5] Osaka Univ, Genome Informat Res Ctr, Res Inst Microbial Dis, Yamadaoka 3-1, Suita, Osaka 5650871, Japan
[6] Osaka Univ, Inst Open & Transdisciplinaty Res Initiat, Integrated Frontier Res Med Sci Div, Suita, Osaka 5650871, Japan
[7] Osaka Univ, WPI Immunol Frontier Res Ctr, Human Immunol, Single Cell Genom, Suita, Osaka 5650871, Japan
基金
日本学术振兴会;
关键词
Nanopore sequencing; Annotation; Isoforms; Differential gene expression; Stranding; Illumina; Full-length cDNA; LONG-NONCODING RNAS; SEQ; QUANTIFICATION; RECONSTRUCTION; ANNOTATION; EXPRESSION;
D O I
10.1007/s10142-020-00732-1
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Sequencing the entire RNA molecule leads to a better understanding of the transcriptome architecture. SMARTer (Switching Mechanism at 5 '-End of RNA Template) is a technology aimed at generating full-length cDNA from low amounts of mRNA for sequencing by short-read sequencers such as those from Illumina. However, short read sequencing such as Illumina technology includes fragmentation that results in bias and information loss. Here, we built a pipeline, UNAGI or UNAnnotated Gene Identifier, to process long reads obtained with nanopore sequencing and compared this pipeline with the standard Illumina pipeline by studying theSaccharomyces cerevisiaetranscriptome in full-length cDNA samples generated from two different biological samples: haploid and diploid cells. Additionally, we processed the long reads with another long read tool, FLAIR. Our strand-aware method revealed significant differential gene expression that was masked in Illumina data by antisense transcripts. Our pipeline, UNAGI, outperformed the Illumina pipeline and FLAIR in transcript reconstruction (sensitivity and specificity of 80% and 40% vs. 18% and 34% and 79% and 32%, respectively). Moreover, UNAGI discovered 3877 unannotated transcripts including 1282 intergenic transcripts while the Illumina pipeline discovered only 238 unannotated transcripts. For isoforms profiling, UNAGI also outperformed the Illumina pipeline and FLAIR in terms of sensitivity (91% vs. 82% and 63%, respectively). But the low accuracy of nanopore sequencing led to a closer gap in terms of specificity with Illumina pipeline (70% vs. 63%) and to a huge gap with FLAIR (70% vs 0.02%).
引用
收藏
页码:523 / 536
页数:14
相关论文
共 49 条
  • [1] A survey of the sorghum transcriptome using single-molecule long reads
    Abdel-Ghany, Salah E.
    Hamilton, Michael
    Jacobi, Jennifer L.
    Ngam, Peter
    Devitt, Nicholas
    Schilkey, Faye
    Ben-Hur, Asa
    Reddy, Anireddy S. N.
    [J]. NATURE COMMUNICATIONS, 2016, 7
  • [2] Differential expression analysis for sequence count data
    Anders, Simon
    Huber, Wolfgang
    [J]. GENOME BIOLOGY, 2010, 11 (10):
  • [3] Bayega A., 2018, bioRxiv, P478172, DOI DOI 10.1101/478172
  • [4] Determining exon connectivity in complex mRNAs by nanopore sequencing
    Bolisetty, Mohan T.
    Rajadinakaran, Gopinath
    Graveley, Brenton R.
    [J]. GENOME BIOLOGY, 2015, 16
  • [5] Bostick M, 2016, CURRENT PROTOCOLS MO
  • [6] Nanopore long-read RNAseq reveals widespread transcriptional variation among the surface receptors of individual B cells
    Byrne, Ashley
    Beaudin, Anna E.
    Olsen, Hugh E.
    Jain, Miten
    Cole, Charles
    Palmer, Theron
    DuBois, Rebecca M.
    Forsberg, E. Camilla
    Akeson, Mark
    Vollmers, Christopher
    [J]. NATURE COMMUNICATIONS, 2017, 8
  • [7] Bimodal expression of PHO84 is modulated by early termination of antisense transcription
    Castelnuovo, Manuele
    Rahman, Samir
    Guffanti, Elisa
    Infantino, Valentina
    Stutz, Francoise
    Zenklusen, Daniel
    [J]. NATURE STRUCTURAL & MOLECULAR BIOLOGY, 2013, 20 (07) : 851 - +
  • [8] A survey of best practices for RNA-seq data analysis
    Conesa, Ana
    Madrigal, Pedro
    Tarazona, Sonia
    Gomez-Cabrero, David
    Cervera, Alejandra
    McPherson, Andrew
    Szczesniak, Michal Wojciech
    Gaffney, Daniel J.
    Elo, Laura L.
    Zhang, Xuegong
    Mortazavi, Ali
    [J]. GENOME BIOLOGY, 2016, 17
  • [9] Comprehensive mass-spectrometry-based proteome quantification of haploid versus diploid yeast
    de Godoy, Lyris M. F.
    Olsen, Jesper V.
    Cox, Juergen
    Nielsen, Michael L.
    Hubner, Nina C.
    Froehlich, Florian
    Walther, Tobias C.
    Mann, Matthias
    [J]. NATURE, 2008, 455 (7217) : 1251 - U60
  • [10] Identification of seipin-linked factors that act as determinants of a lipid droplet subpopulation
    Eisenberg-Bord, Michal
    Mari, Muriel
    Weill, Uri
    Rosenfeld-Gur, Eden
    Moldavski, Ofer
    Castro, Ines G.
    Soni, Krishnakant G.
    Harpaz, Nofar
    Levine, Tim P.
    Futerman, Anthony H.
    Reggiori, Fulvio
    Bankaitis, Vytas A.
    Schuldiner, Maya
    Bohnert, Maria
    [J]. JOURNAL OF CELL BIOLOGY, 2018, 217 (01) : 269 - 282