Accurate Identification and Analysis of Human mRNA Isoforms Using Deep Long Read Sequencing

被引:49
|
作者
Tilgner, Hagen [1 ]
Raha, Debasish [2 ]
Habegger, Lukas [3 ,4 ]
Mohiuddin, Mohammed [5 ]
Gerstein, Mark [3 ,4 ]
Snyder, Michael [1 ]
机构
[1] Stanford Univ, Dept Genet, Stanford, CA 94305 USA
[2] Yale Univ, Dept Mol Cellular & Dev Biol, New Haven, CT 06120 USA
[3] Yale Univ, Program Computat Biol, New Haven, CT 06120 USA
[4] Yale Univ, Dept Mol Biophys & Biochem, New Haven, CT 06120 USA
[5] Roche, Branford, CT 06405 USA
来源
G3-GENES GENOMES GENETICS | 2013年 / 3卷 / 03期
基金
美国国家卫生研究院;
关键词
RNA; Roche sequencing; human; splicing; transcriptome; HUMAN GENOME; SEQ; TRANSCRIPTOMES; ANNOTATION; EXPRESSION; LANDSCAPE; GENCODE; CELLS;
D O I
10.1534/g3.112.004812
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Precise identification of RNA-coding regions and transcriptomes of eukaryotes is a significant problem in biology. Currently, eukaryote transcriptomes are analyzed using deep short-read sequencing experiments of complementary DNAs. The resulting short-reads are then aligned against a genome and annotated junctions to infer biological meaning. Here we use long-read complementary DNA datasets for the analysis of a eukaryotic transcriptome and generate two large datasets in the human K562 and HeLa S3 cell lines. Both data sets comprised at least 4 million reads and had median read lengths greater than 500 bp. We show that annotation-independent alignments of these reads provide partial gene structures that are very much in-line with annotated gene structures, 15% of which have not been obtained in a previous de novo analysis of short reads. For long-noncoding RNAs (i.e., lncRNA) genes, however, we find an increased fraction of novel gene structures among our alignments. Other important aspects of transcriptome analysis, such as the description of cell type-specific splicing, can be performed in an accurate, reliable and completely annotation-free manner, making it ideal for the analysis of transcriptomes of newly sequenced genomes. Furthermore, we demonstrate that long read sequence can be assembled into full-length transcripts with considerable success. Our method is applicable to all long read sequencing technologies.
引用
收藏
页码:387 / 397
页数:11
相关论文
共 50 条
  • [21] Genome-wide identification of long non-coding RNA and mRNA profiling using RNA sequencing in subjects with sensitive skin
    Yang, Li
    Lyu, Lechun
    Wu, Wenjuan
    Lei, Dongyun
    Tu, Ying
    Xu, Dan
    Feng, Jiaqi
    He, Li
    ONCOTARGET, 2017, 8 (70) : 114894 - 114910
  • [22] Accurate targeted long-read DNA methylation and hydroxymethylation sequencing with TAPS
    Liu, Yibin
    Cheng, Jingfei
    Siejka-Zielinska, Paulina
    Weldon, Carika
    Roberts, Hannah
    Lopopolo, Maria
    Magri, Andrea
    D'Arienzo, Valentina
    Harris, James M.
    McKeating, Jane A.
    Song, Chun-Xiao
    GENOME BIOLOGY, 2020, 21 (01)
  • [23] Identification and characterization of occult human-specific LINE-1 insertions using long-read sequencing technology
    Zhou, Weichen
    Emery, Sarah B.
    Flasch, Diane A.
    Wang, Yifan
    Kwan, Kenneth Y.
    Kidd, Jeffrey M.
    Moran, John, V
    Mills, Ryan E.
    NUCLEIC ACIDS RESEARCH, 2020, 48 (03) : 1146 - 1163
  • [24] Identification and characterization of human KALRN mRNA and Kalirin protein isoforms
    Mould, Arne W.
    Wright, David J.
    Bornemann, Klaus D.
    Hengerer, Bastian
    Pinnock, Rob
    Drydale, Edward
    Bancroft, James
    Hall, Nicola A. L.
    von Delft, Annette
    Brennan, Paul E.
    Harrison, Paul J.
    Haerty, Wilfried
    Tunbridge, Elizabeth M.
    CEREBRAL CORTEX, 2024, 34 (12)
  • [25] Profiling the epigenome using long-read sequencing
    Liu, Tianyuan
    Conesa, Ana
    NATURE GENETICS, 2025, 57 (01) : 27 - 41
  • [26] Long-read sequencing data analysis for yeasts
    Yue, Jia-Xing
    Liti, Gianni
    NATURE PROTOCOLS, 2018, 13 (06) : 1213 - 1231
  • [27] Freddie: annotation-independent detection and discovery of transcriptomic alternative splicing isoforms using long-read sequencing
    Orabi, Baraa
    Xie, Ning
    McConeghy, Brian
    Dong, Xuesen
    Chauve, Cedric
    Hach, Faraz
    NUCLEIC ACIDS RESEARCH, 2023, 51 (02) : E11
  • [28] Long-read RNA sequencing atlas of human microglia isoforms elucidates disease-associated genetic regulation of splicing
    Humphrey, Jack
    Brophy, Erica
    Kosoy, Roman
    Zeng, Biao
    Coccia, Elena
    Mattei, Daniele
    Ravi, Ashvin
    Naito, Tatsuhiko
    Efthymiou, Anastasia G.
    Navarro, Elisa
    De Sanctis, Claudia
    Flores-Almazan, Victoria
    Muller, Benjamin Z.
    Snijders, Gijsje J. L. J.
    Allan, Amanda
    Muench, Alexandra
    Kitata, Reta Birhanu
    Kleopoulos, Steven P.
    Argyriou, Stathis
    Malakates, Periklis
    Psychogyiou, Konstantina
    Shao, Zhiping
    Francoeur, Nancy
    Tsai, Chia-Feng
    Gritsenko, Marina A.
    Monroe, Matthew E.
    Paurus, Vanessa L.
    Weitz, Karl K.
    Shi, Tujin
    Sebra, Robert
    Liu, Tao
    de Witte, Lot D.
    Goate, Alison M.
    Bennett, David A.
    Haroutunian, Vahram
    Hoffman, Gabriel E.
    Fullard, John F.
    Roussos, Panos
    Raj, Towfique
    NATURE GENETICS, 2025, : 604 - 615
  • [29] Long-read single-cell sequencing reveals expressions of hypermutation clusters of isoforms in human liver cancer cells
    Liu, Silvia
    Yu, Yan-Ping
    Ren, Bao-Guo
    Ben-Yehezkel, Tuval
    Obert, Caroline
    Smith, Mat
    Wang, Wenjia
    Ostrowska, Alina
    Soto-Gutierrez, Alejandro
    Luo, Jian-Hua
    ELIFE, 2024, 12
  • [30] Long-read human genome sequencing and its applications
    Logsdon, Glennis A.
    Vollger, Mitchell R.
    Eichler, Evan E.
    NATURE REVIEWS GENETICS, 2020, 21 (10) : 597 - 614