Optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation

被引:61
|
作者
Hara, Yuichiro [1 ]
Tatsumi, Kaori [1 ]
Yoshida, Michio [2 ]
Kajikawa, Eriko [2 ]
Kiyonari, Hiroshi [3 ,4 ]
Kuraku, Shigehiro [1 ]
机构
[1] RIKEN Ctr Life Sci Technol, Phyloinformat Unit, Chuo Ku, Kobe, Hyogo 6500047, Japan
[2] RIKEN Ctr Dev Biol, Lab Vertebrate Body Plan, Chuo Ku, Kobe, Hyogo 6500047, Japan
[3] RIKEN Ctr Life Sci Technol, Anim Resource Dev Unit, Chuo Ku, Kobe, Hyogo 6500047, Japan
[4] RIKEN Ctr Life Sci Technol, Genet Engn Team, Chuo Ku, Kobe, Hyogo 6500047, Japan
来源
BMC GENOMICS | 2015年 / 16卷
关键词
RNA-seq; Transcriptome sequencing; de novo assembly; Completeness assessment; Library insert length; CVG (core vertebrate genes); Madagascar ground gecko; RNA-SEQ DATA; GENE-EXPRESSION; GENERATION; ALIGNMENT; MODEL; RECONSTRUCTION; OPTIMIZATION; GENOME; COFFEE; TREES;
D O I
10.1186/s12864-015-2007-1
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: RNA-seq enables gene expression profiling in selected spatiotemporal windows and yields massive sequence information with relatively low cost and time investment, even for non-model species. However, there remains a large room for optimizing its workflow, in order to take full advantage of continuously developing sequencing capacity. Method: Transcriptome sequencing for three embryonic stages of Madagascar ground gecko (Paroedura picta) was performed with the Illumina platform. The output reads were assembled de novo for reconstructing transcript sequences. In order to evaluate the completeness of transcriptome assemblies, we prepared a reference gene set consisting of vertebrate one-to-one orthologs. Result: To take advantage of increased read length of >150 nt, we demonstrated shortened RNA fragmentation time, which resulted in a dramatic shift of insert size distribution. To evaluate products of multiple de novo assembly runs incorporating reads with different RNA sources, read lengths, and insert sizes, we introduce a new reference gene set, core vertebrate genes (CVG), consisting of 233 genes that are shared as one-to-one orthologs by all vertebrate genomes examined (29 species)., The completeness assessment performed by the computational pipelines CEGMA and BUSCO referring to CVG, demonstrated higher accuracy and resolution than with the gene set previously established for this purpose. As a result of the assessment with CVG, we have derived the most comprehensive transcript sequence set of the Madagascar ground gecko by means of assembling individual libraries followed by clustering the assembled sequences based on their overall similarities. Conclusion: Our results provide several insights into optimizing de novo RNA-seq workflow, including the coordination between library insert size and read length, which manifested in improved connectivity of assemblies. The approach and assembly assessment with CVG demonstrated here would be applicable to transcriptome analysis of other species as well as whole genome analyses.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Characterization of Liaoning Cashmere Goat Transcriptome: Sequencing, De Novo Assembly, Functional Annotation and Comparative Analysis
    Liu, Hongliang
    Wang, Tingting
    Wang, Jinke
    Quan, Fusheng
    Zhang, Yong
    PLOS ONE, 2013, 8 (10):
  • [42] IDP-denovo: de novo transcriptome assembly and isoform annotation by hybrid sequencing
    Fu, Shuhua
    Ma, Yingke
    Yao, Hui
    Xu, Zhichao
    Chen, Shilin
    Song, Jingyuan
    Au, Kin Fai
    BIOINFORMATICS, 2018, 34 (13) : 2168 - 2176
  • [43] Holm Oak (Quercus ilex) Transcriptome. De novo Sequencing and Assembly Analysis
    Guerrero-Sanchez, Victor M.
    Maldonado-Alconada, Ana M.
    Amil-Ruiz, Francisco
    Jorrin-Novo, Jesus V.
    FRONTIERS IN MOLECULAR BIOSCIENCES, 2017, 4
  • [44] Sequencing, De Novo Assembly and Annotation of the Colorado Potato Beetle, Leptinotarsa decemlineata, Transcriptome
    Kumar, Abhishek
    Congiu, Leonardo
    Lindstrom, Leena
    Piiroinen, Saija
    Vidotto, Michele
    Grapputo, Alessandro
    PLOS ONE, 2014, 9 (01):
  • [45] Characterization of Nibea albiflora Transcriptome: Sequencing, De Novo Assembly, Annotation and Comparative Genomics
    Zhan, Wei
    Chen, Ruiyi
    Laghari, M. Y.
    Xu, Dongdong
    Mao, Guomin
    Shi, Huilai
    Lou, Bao
    PAKISTAN JOURNAL OF ZOOLOGY, 2016, 48 (02) : 427 - 434
  • [46] De novo assembly and characterization of the garlic (Allium sativum) bud transcriptome by Illumina sequencing
    Sun, Xiudong
    Zhou, Shumei
    Meng, Fanlu
    Liu, Shiqi
    PLANT CELL REPORTS, 2012, 31 (10) : 1823 - 1828
  • [47] RNA-Seq analysis and de novo transcriptome assembly of Hevea brasiliensis
    Xia, Zhihui
    Xu, Huimin
    Zhai, Jinling
    Li, Dejun
    Luo, Hongli
    He, Chaozu
    Huang, Xi
    PLANT MOLECULAR BIOLOGY, 2011, 77 (03) : 299 - 308
  • [48] De Novo Transcriptome Assembly and SNP Discovery for the Development of dCAPS Markers in Oat
    Kim, Tae-Heon
    Yoon, Young-Mi
    Park, Jin-Cheon
    Park, Jong-Ho
    Kim, Kyong-Ho
    Kim, Yang-Kil
    Son, Jae-Han
    Park, Tae-Il
    AGRONOMY-BASEL, 2022, 12 (01):
  • [49] De novo transcriptome assembly and annotation for gene discovery in avocado, macadamia and mango
    Chabikwa, Tinashe G.
    Barbier, Francois F.
    Tanurdzic, Milos
    Beveridge, Christine A.
    SCIENTIFIC DATA, 2020, 7 (01)
  • [50] De novo transcriptome assembly for the spiny mouse (Acomys cahirinus)
    Mamrot, Jared
    Legaie, Roxane
    Ellery, Stacey J.
    Wilson, Trevor
    Seemann, Torsten
    Powell, David R.
    Gardner, David K.
    Walker, David W.
    Temple-Smith, Peter
    Papenfuss, Anthony T.
    Dickinson, Hayley
    SCIENTIFIC REPORTS, 2017, 7