Accurate Identification and Analysis of Human mRNA Isoforms Using Deep Long Read Sequencing

被引:49
|
作者
Tilgner, Hagen [1 ]
Raha, Debasish [2 ]
Habegger, Lukas [3 ,4 ]
Mohiuddin, Mohammed [5 ]
Gerstein, Mark [3 ,4 ]
Snyder, Michael [1 ]
机构
[1] Stanford Univ, Dept Genet, Stanford, CA 94305 USA
[2] Yale Univ, Dept Mol Cellular & Dev Biol, New Haven, CT 06120 USA
[3] Yale Univ, Program Computat Biol, New Haven, CT 06120 USA
[4] Yale Univ, Dept Mol Biophys & Biochem, New Haven, CT 06120 USA
[5] Roche, Branford, CT 06405 USA
来源
G3-GENES GENOMES GENETICS | 2013年 / 3卷 / 03期
基金
美国国家卫生研究院;
关键词
RNA; Roche sequencing; human; splicing; transcriptome; HUMAN GENOME; SEQ; TRANSCRIPTOMES; ANNOTATION; EXPRESSION; LANDSCAPE; GENCODE; CELLS;
D O I
10.1534/g3.112.004812
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Precise identification of RNA-coding regions and transcriptomes of eukaryotes is a significant problem in biology. Currently, eukaryote transcriptomes are analyzed using deep short-read sequencing experiments of complementary DNAs. The resulting short-reads are then aligned against a genome and annotated junctions to infer biological meaning. Here we use long-read complementary DNA datasets for the analysis of a eukaryotic transcriptome and generate two large datasets in the human K562 and HeLa S3 cell lines. Both data sets comprised at least 4 million reads and had median read lengths greater than 500 bp. We show that annotation-independent alignments of these reads provide partial gene structures that are very much in-line with annotated gene structures, 15% of which have not been obtained in a previous de novo analysis of short reads. For long-noncoding RNAs (i.e., lncRNA) genes, however, we find an increased fraction of novel gene structures among our alignments. Other important aspects of transcriptome analysis, such as the description of cell type-specific splicing, can be performed in an accurate, reliable and completely annotation-free manner, making it ideal for the analysis of transcriptomes of newly sequenced genomes. Furthermore, we demonstrate that long read sequence can be assembled into full-length transcripts with considerable success. Our method is applicable to all long read sequencing technologies.
引用
收藏
页码:387 / 397
页数:11
相关论文
共 50 条
  • [31] Uncovering full-length transcript isoforms of sugarcane cultivar Khon Kaen 3 using single-molecule long-read sequencing
    Piriyapongsa, Jittima
    Kaewprommal, Pavita
    Vaiwsri, Sirintra
    Anuntakarun, Songtham
    Wirojsirasak, Warodom
    Punpee, Prapat
    Klomsa-ard, Peeraya
    Shaw, Philip J.
    Pootakham, Wirulda
    Yoocha, Thippawan
    Sangsrakru, Duangjai
    Tangphatsornruang, Sithichoke
    Tongsima, Sissades
    Tragoonrung, Somvong
    PEERJ, 2018, 6
  • [32] Single-molecule long-read sequencing analysis improves genome annotation and sheds new light on the transcripts and splice isoforms of Zoysia japonica
    Guan, Jin
    Yin, Shuxia
    Yue, Yuesen
    Liu, Lingyun
    Guo, Yidi
    Zhang, Hui
    Fan, Xifeng
    Teng, Ke
    BMC PLANT BIOLOGY, 2022, 22 (01)
  • [33] Normalized long read RNA sequencing in chicken reveals transcriptome complexity similar to human
    Kuo, Richard I.
    Tseng, Elizabeth
    Eory, Lel
    Paton, Ian R.
    Archibald, Alan L.
    Burt, David W.
    BMC GENOMICS, 2017, 18
  • [34] Comprehensive assessment of mRNA isoform detection methods for long-read sequencing data
    Su, Yaqi
    Yu, Zhejian
    Jin, Siqian
    Ai, Zhipeng
    Yuan, Ruihong
    Chen, Xinyi
    Xue, Ziwei
    Guo, Yixin
    Chen, Di
    Liang, Hongqing
    Liu, Zuozhu
    Liu, Wanlu
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [35] Bridging the splicing gap in human genetics with long-read RNA sequencing: finding the protein isoform drivers of disease
    Castaldi, Peter J.
    Abood, Abdullah
    Farber, Charles R.
    Sheynkman, Gloria M.
    HUMAN MOLECULAR GENETICS, 2022, 31 (R1) : R123 - R136
  • [36] PacBio single-molecule long-read sequencing shed new light on the transcripts and splice isoforms of the perennial ryegrass
    Xie, Lijuan
    Teng, Ke
    Tan, Penghui
    Chao, Yuehui
    Li, Yinruizhi
    Guo, Weier
    Han, Liebao
    MOLECULAR GENETICS AND GENOMICS, 2020, 295 (02) : 475 - 489
  • [37] The time is ripe to investigate human centromeres by long-read sequencing†
    Suzuki, Yuta
    Morishita, Shinichi
    DNA RESEARCH, 2021, 28 (06)
  • [38] Long-read sequencing in deciphering human genetics to a greater depth
    Midha, Mohit K.
    Wu, Mengchu
    Chiu, Kuo-Ping
    HUMAN GENETICS, 2019, 138 (11-12) : 1201 - 1215
  • [39] Analysis of the transgene insertion pattern in a transgenic mouse strain using long-read sequencing
    Suzuki, Osamu
    Koura, Minako
    Uchio-Yamada, Kozue
    Sasaki, Mitsuho
    EXPERIMENTAL ANIMALS, 2020, 69 (03) : 279 - 286
  • [40] Genome analysis of Zoysia japonica 'Yaji' cultivar using PacBio long-read sequencing
    Yang, Dae-Hwa
    Jeong, Ok-Cheol
    Sun, Hyeon-Jin
    Kang, Hong-Gyu
    Lee, Hyo-Yeon
    PLANT BIOTECHNOLOGY REPORTS, 2023, 17 (02) : 275 - 283