IDP-denovo: de novo transcriptome assembly and isoform annotation by hybrid sequencing

被引:31
作者
Fu, Shuhua [1 ]
Ma, Yingke [1 ]
Yao, Hui [2 ]
Xu, Zhichao [2 ]
Chen, Shilin [2 ]
Song, Jingyuan [2 ]
Au, Kin Fai [1 ,3 ]
机构
[1] Univ Iowa, Dept Internal Med, Iowa City, IA 52242 USA
[2] Chinese Acad Med Sci, Inst Med Plant Dev, Peking Union Med Coll, Beijing 100193, Peoples R China
[3] Univ Iowa, Dept Biostat, Iowa City, IA 52242 USA
基金
美国国家卫生研究院;
关键词
RNA-SEQ; MESSENGER-RNA; DNA-SEQUENCE; BIOLOGY; EXPRESSION;
D O I
10.1093/bioinformatics/bty098
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: In the past years, the long read (LR) sequencing technologies, such as Pacific Biosciences and Oxford Nanopore Technologies, have been demonstrated to substantially improve the quality of genome assembly and transcriptome characterization. Compared to the high cost of genome assembly by LR sequencing, it is more affordable to generate LRs for transcriptome characterization. That is, when informative transcriptome LR data are available without a high-quality genome, a method for de novo transcriptome assembly and annotation is of high demand. Results: Without a reference genome, IDP-denovo performs de novo transcriptome assembly, isoform annotation and quantification by integrating the strengths of LRs and short reads. Using the GM12878 human data as a gold standard, we demonstrated that IDP-denovo had superior sensitivity of transcript assembly and high accuracy of isoform annotation. In addition, IDP-denovo outputs two abundance indices to provide a comprehensive expression profile of genes/isoforms. IDP-denovo represents a robust approach for transcriptome assembly, isoform annotation and quantification for non-model organism studies. Applying IDP-denovo to a non-model organism, Dendrobium officinale, we discovered a number of novel genes and novel isoforms that were not reported by the existing annotation library. These results reveal the high diversity of gene isoforms in D. officinale, which was not reported in the existing annotation library.
引用
收藏
页码:2168 / 2176
页数:9
相关论文
共 60 条
[1]   Cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome NGS data [J].
Aflitos, Saulo Alves ;
Severing, Edouard ;
Sanchez-Perez, Gabino ;
Peters, Sander ;
de Jong, Hans ;
de Ridder, Dick .
BMC BIOINFORMATICS, 2015, 16
[2]   Characterization of the human ESC transcriptome by hybrid sequencing [J].
Au, Kin Fai ;
Sebastiano, Vittorio ;
Afshar, Pegah Tootoonchi ;
Durruthy, Jens Durruthy ;
Lee, Lawrence ;
Williams, Brian A. ;
van Bakel, Harm ;
Schadt, Eric E. ;
Reijo-Pera, Renee A. ;
Underwood, Jason G. ;
Wong, Wing Hung .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2013, 110 (50) :E4821-E4830
[3]   Improving PacBio Long Read Accuracy by Short Read Alignment [J].
Au, Kin Fai ;
Underwood, Jason G. ;
Lee, Lawrence ;
Wong, Wing Hung .
PLOS ONE, 2012, 7 (10)
[4]   Determining exon connectivity in complex mRNAs by nanopore sequencing [J].
Bolisetty, Mohan T. ;
Rajadinakaran, Gopinath ;
Graveley, Brenton R. .
GENOME BIOLOGY, 2015, 16
[5]   Bridger: a new framework for de novo transcriptome assembly using RNA-seq data [J].
Chang, Zheng ;
Li, Guojun ;
Liu, Juntao ;
Zhang, Yu ;
Ashby, Cody ;
Liu, Deli ;
Cramer, Carole L. ;
Huang, Xiuzhen .
GENOME BIOLOGY, 2015, 16
[6]   De novo transcriptome assembly of RNA-Seq reads with different strategies [J].
Chen Geng ;
Yin KangPing ;
Wang, Charles ;
Shi TieLiu .
SCIENCE CHINA-LIFE SCIENCES, 2011, 54 (12) :1129-1133
[7]   Next-generation biology: Sequencing and data analysis approaches for non-model organisms [J].
da Fonseca, Rute R. ;
Albrechtsen, Anders ;
Themudo, Goncalo Espregueira ;
Ramos-Madrigal, Jazmin ;
Sibbesen, Jonas Andreas ;
Maretty, Lasse ;
Zepeda-Mendoza, M. Lisandra ;
Campos, Paula F. ;
Heller, Rasmus ;
Pereira, Ricardo J. .
MARINE GENOMICS, 2016, 30 :3-13
[8]   A survey of human brain transcriptome diversity at the single cell level [J].
Darmanis, Spyros ;
Sloan, Steven A. ;
Zhang, Ye ;
Enge, Martin ;
Caneda, Christine ;
Shuer, Lawrence M. ;
Gephart, Melanie G. Hayden ;
Barres, Ben A. ;
Quake, Stephen R. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2015, 112 (23) :7285-7290
[9]   IDP-ASE: haplotyping and quantifying allele-specific expression at the gene and gene isoform level by hybrid sequencing [J].
Deonovic, Benjamin ;
Wang, Yunhao ;
Weirather, Jason ;
Wang, Xiu-Jie ;
Au, Kin Fai .
NUCLEIC ACIDS RESEARCH, 2017, 45 (05)
[10]   Applications of next generation sequencing in molecular ecology of non-model organisms [J].
Ekblom, R. ;
Galindo, J. .
HEREDITY, 2011, 107 (01) :1-15