Accurate Identification and Analysis of Human mRNA Isoforms Using Deep Long Read Sequencing

被引:49
|
作者
Tilgner, Hagen [1 ]
Raha, Debasish [2 ]
Habegger, Lukas [3 ,4 ]
Mohiuddin, Mohammed [5 ]
Gerstein, Mark [3 ,4 ]
Snyder, Michael [1 ]
机构
[1] Stanford Univ, Dept Genet, Stanford, CA 94305 USA
[2] Yale Univ, Dept Mol Cellular & Dev Biol, New Haven, CT 06120 USA
[3] Yale Univ, Program Computat Biol, New Haven, CT 06120 USA
[4] Yale Univ, Dept Mol Biophys & Biochem, New Haven, CT 06120 USA
[5] Roche, Branford, CT 06405 USA
来源
G3-GENES GENOMES GENETICS | 2013年 / 3卷 / 03期
基金
美国国家卫生研究院;
关键词
RNA; Roche sequencing; human; splicing; transcriptome; HUMAN GENOME; SEQ; TRANSCRIPTOMES; ANNOTATION; EXPRESSION; LANDSCAPE; GENCODE; CELLS;
D O I
10.1534/g3.112.004812
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Precise identification of RNA-coding regions and transcriptomes of eukaryotes is a significant problem in biology. Currently, eukaryote transcriptomes are analyzed using deep short-read sequencing experiments of complementary DNAs. The resulting short-reads are then aligned against a genome and annotated junctions to infer biological meaning. Here we use long-read complementary DNA datasets for the analysis of a eukaryotic transcriptome and generate two large datasets in the human K562 and HeLa S3 cell lines. Both data sets comprised at least 4 million reads and had median read lengths greater than 500 bp. We show that annotation-independent alignments of these reads provide partial gene structures that are very much in-line with annotated gene structures, 15% of which have not been obtained in a previous de novo analysis of short reads. For long-noncoding RNAs (i.e., lncRNA) genes, however, we find an increased fraction of novel gene structures among our alignments. Other important aspects of transcriptome analysis, such as the description of cell type-specific splicing, can be performed in an accurate, reliable and completely annotation-free manner, making it ideal for the analysis of transcriptomes of newly sequenced genomes. Furthermore, we demonstrate that long read sequence can be assembled into full-length transcripts with considerable success. Our method is applicable to all long read sequencing technologies.
引用
收藏
页码:387 / 397
页数:11
相关论文
共 50 条
  • [41] Long-read RNA sequencing: A transformative technology for exploring transcriptome complexity in human diseases
    Ament, Isabelle Heifetz
    Debruyne, Nicole
    Wang, Feng
    Lin, Lan
    MOLECULAR THERAPY, 2025, 33 (03) : 883 - 894
  • [42] long-read-tools.org: an interactive catalogue of analysis methods for long-read sequencing data
    Amarasinghe, Shanika L.
    Ritchie, Matthew E.
    Gouil, Quentin
    GIGASCIENCE, 2021, 10 (02):
  • [43] Identification of a lncRNA/circRNA-miRNA-mRNA network in Nasopharyngeal Carcinoma by deep sequencing and bioinformatics analysis
    Liu, Shilei
    Li, Xiaoxiao
    Xie, Qingming
    Zhang, Sai
    Liang, Xujun
    Li, Shisheng
    Zhang, Pengfei
    JOURNAL OF CANCER, 2024, 15 (07): : 1916 - 1928
  • [44] Extension of human lncRNA transcripts by RACE coupled with long-read high-throughput sequencing (RACE-Seq)
    Lagarde, Julien
    Uszczynska-Ratajczak, Barbara
    Santoyo-Lopez, Javier
    Gonzalez, Jose Manuel
    Tapanari, Electra
    Mudge, Jonathan M.
    Steward, Charles A.
    Wilming, Laurens
    Tanzer, Andrea
    Howald, Cedric
    Chrast, Jacqueline
    Vela-Boza, Alicia
    Rueda, Antonio
    Lopez-Domingo, Francisco J.
    Dopazo, Joaquin
    Reymond, Alexandre
    Guigo, Roderic
    Harrow, Jennifer
    NATURE COMMUNICATIONS, 2016, 7
  • [45] Analysis of transcripts and splice isoforms in red clover (Trifolium pratense L.) by single-molecule long-read sequencing
    Chao, Yuehui
    Yuan, Jianbo
    Li, Sifeng
    Jia, Siqiao
    Han, Liebao
    Xu, Lixin
    BMC PLANT BIOLOGY, 2018, 18
  • [46] LncRNAnet: long non-coding RNA identification using deep learning
    Baek, Junghwan
    Lee, Byunghan
    Kwon, Sunyoung
    Yoon, Sungroh
    BIOINFORMATICS, 2018, 34 (22) : 3889 - 3897
  • [47] IsoTools: a flexible workflow for long-read transcriptome sequencing analysis
    Lienhard, Matthias
    van den Beucken, Twan
    Timmermann, Bernd
    Hochradel, Myriam
    Boerno, Stefan
    Caiment, Florian
    Vingron, Martin
    Herwig, Ralf
    BIOINFORMATICS, 2023, 39 (06)
  • [48] Dual Platform Long-Read RNA-Sequencing Dataset of the Human Cytomegalovirus Lytic Transcriptome
    Balazs, Zsolt
    Tombacz, Dora
    Szucs, Attila
    Snyder, Michael
    Boldogkoi, Zsolt
    FRONTIERS IN GENETICS, 2018, 9
  • [49] Accurate identification of A-to-I RNA editing in human by transcriptome sequencing
    Bahn, Jae Hoon
    Lee, Jae-Hyung
    Li, Gang
    Greer, Christopher
    Peng, Guangdun
    Xiao, Xinshu
    GENOME RESEARCH, 2012, 22 (01) : 142 - 150
  • [50] Whole Genome Assembly of Human Papillomavirus by Nanopore Long-Read Sequencing
    Yang, Shuaibing
    Zhao, Qianqian
    Tang, Lihua
    Chen, Zejia
    Wu, Zhaoting
    Li, Kaixin
    Lin, Ruoru
    Chen, Yang
    Ou, Danlin
    Zhou, Li
    Xu, Jianzhen
    Qin, Qingsong
    FRONTIERS IN GENETICS, 2022, 12