Optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation

被引:61
|
作者
Hara, Yuichiro [1 ]
Tatsumi, Kaori [1 ]
Yoshida, Michio [2 ]
Kajikawa, Eriko [2 ]
Kiyonari, Hiroshi [3 ,4 ]
Kuraku, Shigehiro [1 ]
机构
[1] RIKEN Ctr Life Sci Technol, Phyloinformat Unit, Chuo Ku, Kobe, Hyogo 6500047, Japan
[2] RIKEN Ctr Dev Biol, Lab Vertebrate Body Plan, Chuo Ku, Kobe, Hyogo 6500047, Japan
[3] RIKEN Ctr Life Sci Technol, Anim Resource Dev Unit, Chuo Ku, Kobe, Hyogo 6500047, Japan
[4] RIKEN Ctr Life Sci Technol, Genet Engn Team, Chuo Ku, Kobe, Hyogo 6500047, Japan
来源
BMC GENOMICS | 2015年 / 16卷
关键词
RNA-seq; Transcriptome sequencing; de novo assembly; Completeness assessment; Library insert length; CVG (core vertebrate genes); Madagascar ground gecko; RNA-SEQ DATA; GENE-EXPRESSION; GENERATION; ALIGNMENT; MODEL; RECONSTRUCTION; OPTIMIZATION; GENOME; COFFEE; TREES;
D O I
10.1186/s12864-015-2007-1
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: RNA-seq enables gene expression profiling in selected spatiotemporal windows and yields massive sequence information with relatively low cost and time investment, even for non-model species. However, there remains a large room for optimizing its workflow, in order to take full advantage of continuously developing sequencing capacity. Method: Transcriptome sequencing for three embryonic stages of Madagascar ground gecko (Paroedura picta) was performed with the Illumina platform. The output reads were assembled de novo for reconstructing transcript sequences. In order to evaluate the completeness of transcriptome assemblies, we prepared a reference gene set consisting of vertebrate one-to-one orthologs. Result: To take advantage of increased read length of >150 nt, we demonstrated shortened RNA fragmentation time, which resulted in a dramatic shift of insert size distribution. To evaluate products of multiple de novo assembly runs incorporating reads with different RNA sources, read lengths, and insert sizes, we introduce a new reference gene set, core vertebrate genes (CVG), consisting of 233 genes that are shared as one-to-one orthologs by all vertebrate genomes examined (29 species)., The completeness assessment performed by the computational pipelines CEGMA and BUSCO referring to CVG, demonstrated higher accuracy and resolution than with the gene set previously established for this purpose. As a result of the assessment with CVG, we have derived the most comprehensive transcript sequence set of the Madagascar ground gecko by means of assembling individual libraries followed by clustering the assembled sequences based on their overall similarities. Conclusion: Our results provide several insights into optimizing de novo RNA-seq workflow, including the coordination between library insert size and read length, which manifested in improved connectivity of assemblies. The approach and assembly assessment with CVG demonstrated here would be applicable to transcriptome analysis of other species as well as whole genome analyses.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation
    Yuichiro Hara
    Kaori Tatsumi
    Michio Yoshida
    Eriko Kajikawa
    Hiroshi Kiyonari
    Shigehiro Kuraku
    BMC Genomics, 16
  • [2] Sequencing, de novo assembly and annotation of Digitalis ferruginea subsp. schischkinii transcriptome
    Unlu, Ercan Selcuk
    Kaya, Ozge
    Eker, Ismail
    Gurel, Ekrem
    MOLECULAR BIOLOGY REPORTS, 2021, 48 (01) : 127 - 137
  • [3] De novo sequencing and assembly of Azadirachta indica fruit transcriptome
    Krishnan, Neeraja M.
    Pattnaik, Swetansu
    Deepak, S. A.
    Hariharan, Arun K.
    Gaur, Prakhar
    Chaudhary, Rakshit
    Jain, Prachi
    Vaidyanathan, Srividya
    Krishna, P. G. Bharath
    Panda, Binay
    CURRENT SCIENCE, 2011, 101 (12): : 1553 - 1561
  • [4] A simple guide to de novo transcriptome assembly and annotation
    Raghavan, Venket
    Kraft, Louis
    Mesny, Fantin
    Rigerte, Linda
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (02)
  • [5] Optimization of de novo transcriptome assembly from next-generation sequencing data
    Surget-Groba, Yann
    Montoya-Burgos, Juan I.
    GENOME RESEARCH, 2010, 20 (10) : 1432 - 1440
  • [6] TransPi-a comprehensive TRanscriptome ANalysiS PIpeline for de novo transcriptome assembly
    Rivera-Vicens, Ramon E.
    Garcia-Escudero, Catalina A.
    Conci, Nicola
    Eitel, Michael
    Woerheide, Gert
    MOLECULAR ECOLOGY RESOURCES, 2022, 22 (05) : 2070 - 2086
  • [7] De Novo Assembly of the Transcriptome of Turritopsis, a Jellyfish That Repeatedly Rejuvenates
    Hasegawa, Yoshinori
    Watanabe, Takashi
    Takazawa, Masaki
    Ohara, Osamu
    Kubota, Shin
    ZOOLOGICAL SCIENCE, 2016, 33 (04) : 366 - 371
  • [8] Hardware Performance Evaluation of De novo Transcriptome Assembly Software in Amazon Elastic Compute Cloud
    Mora-Marquez, Fernando
    Luis Vazquez-Poletti, Jose
    Chano, Victor
    Collada, Carmen
    Soto, Alvaro
    Lopez de Heredia, Unai
    CURRENT BIOINFORMATICS, 2020, 15 (05) : 420 - 430
  • [9] A survey of the complex transcriptome from the highly polyploid sugarcane genome using full-length isoform sequencing and de novo assembly from short read sequencing
    Hoang, Nam V.
    Furtado, Agnelo
    Mason, Patrick J.
    Marquardt, Annelie
    Kasirajan, Lakshmi
    Thirugnanasambandam, Prathima P.
    Botha, Frederik C.
    Henry, Robert J.
    BMC GENOMICS, 2017, 18
  • [10] De novo assembly and characterization of the skeletal muscle transcriptome of sheep using Illumina paired-end sequencing
    Zhang, Chunlan
    Wang, Guizhi
    Hou, Lei
    Ji, Zhibin
    Wang, Jianmin
    BIOTECHNOLOGY LETTERS, 2015, 37 (09) : 1747 - 1756