Accurate Identification and Analysis of Human mRNA Isoforms Using Deep Long Read Sequencing

被引:49
|
作者
Tilgner, Hagen [1 ]
Raha, Debasish [2 ]
Habegger, Lukas [3 ,4 ]
Mohiuddin, Mohammed [5 ]
Gerstein, Mark [3 ,4 ]
Snyder, Michael [1 ]
机构
[1] Stanford Univ, Dept Genet, Stanford, CA 94305 USA
[2] Yale Univ, Dept Mol Cellular & Dev Biol, New Haven, CT 06120 USA
[3] Yale Univ, Program Computat Biol, New Haven, CT 06120 USA
[4] Yale Univ, Dept Mol Biophys & Biochem, New Haven, CT 06120 USA
[5] Roche, Branford, CT 06405 USA
来源
G3-GENES GENOMES GENETICS | 2013年 / 3卷 / 03期
基金
美国国家卫生研究院;
关键词
RNA; Roche sequencing; human; splicing; transcriptome; HUMAN GENOME; SEQ; TRANSCRIPTOMES; ANNOTATION; EXPRESSION; LANDSCAPE; GENCODE; CELLS;
D O I
10.1534/g3.112.004812
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Precise identification of RNA-coding regions and transcriptomes of eukaryotes is a significant problem in biology. Currently, eukaryote transcriptomes are analyzed using deep short-read sequencing experiments of complementary DNAs. The resulting short-reads are then aligned against a genome and annotated junctions to infer biological meaning. Here we use long-read complementary DNA datasets for the analysis of a eukaryotic transcriptome and generate two large datasets in the human K562 and HeLa S3 cell lines. Both data sets comprised at least 4 million reads and had median read lengths greater than 500 bp. We show that annotation-independent alignments of these reads provide partial gene structures that are very much in-line with annotated gene structures, 15% of which have not been obtained in a previous de novo analysis of short reads. For long-noncoding RNAs (i.e., lncRNA) genes, however, we find an increased fraction of novel gene structures among our alignments. Other important aspects of transcriptome analysis, such as the description of cell type-specific splicing, can be performed in an accurate, reliable and completely annotation-free manner, making it ideal for the analysis of transcriptomes of newly sequenced genomes. Furthermore, we demonstrate that long read sequence can be assembled into full-length transcripts with considerable success. Our method is applicable to all long read sequencing technologies.
引用
收藏
页码:387 / 397
页数:11
相关论文
共 50 条
  • [1] Identification of region-specific gene isoforms in the human brain using long-read transcriptome sequencing
    Shimada, Mihoko
    Omae, Yosuke
    Kakita, Akiyoshi
    Gabdulkhaev, Ramil
    Hitomi, Yuki
    Miyagawa, Taku
    Honda, Makoto
    Fujimoto, Akihiro
    Tokunaga, Katsushi
    SCIENCE ADVANCES, 2024, 10 (04)
  • [2] Long-Read Sequencing of Chicken Transcripts and Identification of New Transcript Isoforms
    Thomas, Sean
    Underwood, Jason G.
    Tseng, Elizabeth
    Holloway, Alisha K.
    PLOS ONE, 2014, 9 (04):
  • [3] Targeted sequencing analysis pipeline for species identification of human pathogenic fungi using long-read nanopore sequencing
    Langsiri, Nattapong
    Worasilchai, Navaporn
    Irinyi, Laszlo
    Jenjaroenpun, Piroon
    Wongsurawat, Thidathip
    Luangsa-ard, Janet Jennifer
    Meyer, Wieland
    Chindamporn, Ariya
    IMA FUNGUS, 2023, 14 (01)
  • [4] Targeted sequencing analysis pipeline for species identification of human pathogenic fungi using long-read nanopore sequencing
    Nattapong Langsiri
    Navaporn Worasilchai
    Laszlo Irinyi
    Piroon Jenjaroenpun
    Thidathip Wongsurawat
    Janet Jennifer Luangsa-ard
    Wieland Meyer
    Ariya Chindamporn
    IMA Fungus, 14
  • [5] Identification of Protein Isoforms Using Reference Databases Built from Long and Short Read RNA-Sequencing
    Tay, Aidan P.
    Hamey, Joshua J.
    Martyn, Gabriella E.
    Wilson, Laurence O. W.
    Wilkins, Marc R.
    JOURNAL OF PROTEOME RESEARCH, 2022, : 1628 - 1639
  • [6] Long-read nanopore sequencing provides fast and accurate identification of genetic variants in the human PRNP gene
    Athanasios, Dimitriadis
    Kroll, Francois
    Campbell, Tracy
    Collinge, John
    Mead, Simon
    Vire, Emmanuelle
    Collinge, John
    PRION, 2019, 13 : 56 - 56
  • [7] Identification of Alternative Polyadenylation in Cyanidioschyzon merolae Through Long-Read Sequencing of mRNA
    Scharfen, Leonard
    Zigackova, Dagmar
    Reimer, Kirsten A.
    Stark, Martha R.
    Slat, Viktor A.
    Francoeur, Nancy J.
    Wells, Melissa L.
    Zhou, Lecong
    Blackshear, Perry J.
    Neugebauer, Karla M.
    Rader, Stephen D.
    FRONTIERS IN GENETICS, 2022, 12
  • [8] Analyzing mRNA isoforms in human stem cell-derived retinal neurons using long-read transcriptomics
    Keuthan, Casey J.
    Parthiban, Sowmya
    Chang, Yen Yu
    Chang, Xiaoli
    Yan, Ethan
    Berlinicke, Cynthia
    Cavalier, Sheridan
    Anastasaki, Corina
    Gutmann, David
    Gamm, David M.
    Zhu, Yuan
    Timp, Winston
    Hicks, Stephanie
    Zack, Donald J.
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2024, 65 (07)
  • [9] SQANTI3: curation of long-read transcriptomes for accurate identification of known and novel isoforms
    Pardo-Palacios, Francisco J.
    Arzalluz-Luque, Angeles
    Kondratova, Liudmyla
    Salguero, Pedro
    Mestre-Tomas, Jorge
    Amorin, Rocio
    Estevan-Morio, Eva
    Liu, Tianyuan
    Nanni, Adalena
    Mcintyre, Lauren
    Tseng, Elizabeth
    Conesa, Ana
    NATURE METHODS, 2024, 21 (05) : 793 - 797
  • [10] SQANTI3: curation of long-read transcriptomes for accurate identification of known and novel isoforms
    Francisco J. Pardo-Palacios
    Angeles Arzalluz-Luque
    Liudmyla Kondratova
    Pedro Salguero
    Jorge Mestre-Tomás
    Rocío Amorín
    Eva Estevan-Morió
    Tianyuan Liu
    Adalena Nanni
    Lauren McIntyre
    Elizabeth Tseng
    Ana Conesa
    Nature Methods, 2024, 21 : 793 - 797