Optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation

被引:61
|
作者
Hara, Yuichiro [1 ]
Tatsumi, Kaori [1 ]
Yoshida, Michio [2 ]
Kajikawa, Eriko [2 ]
Kiyonari, Hiroshi [3 ,4 ]
Kuraku, Shigehiro [1 ]
机构
[1] RIKEN Ctr Life Sci Technol, Phyloinformat Unit, Chuo Ku, Kobe, Hyogo 6500047, Japan
[2] RIKEN Ctr Dev Biol, Lab Vertebrate Body Plan, Chuo Ku, Kobe, Hyogo 6500047, Japan
[3] RIKEN Ctr Life Sci Technol, Anim Resource Dev Unit, Chuo Ku, Kobe, Hyogo 6500047, Japan
[4] RIKEN Ctr Life Sci Technol, Genet Engn Team, Chuo Ku, Kobe, Hyogo 6500047, Japan
来源
BMC GENOMICS | 2015年 / 16卷
关键词
RNA-seq; Transcriptome sequencing; de novo assembly; Completeness assessment; Library insert length; CVG (core vertebrate genes); Madagascar ground gecko; RNA-SEQ DATA; GENE-EXPRESSION; GENERATION; ALIGNMENT; MODEL; RECONSTRUCTION; OPTIMIZATION; GENOME; COFFEE; TREES;
D O I
10.1186/s12864-015-2007-1
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: RNA-seq enables gene expression profiling in selected spatiotemporal windows and yields massive sequence information with relatively low cost and time investment, even for non-model species. However, there remains a large room for optimizing its workflow, in order to take full advantage of continuously developing sequencing capacity. Method: Transcriptome sequencing for three embryonic stages of Madagascar ground gecko (Paroedura picta) was performed with the Illumina platform. The output reads were assembled de novo for reconstructing transcript sequences. In order to evaluate the completeness of transcriptome assemblies, we prepared a reference gene set consisting of vertebrate one-to-one orthologs. Result: To take advantage of increased read length of >150 nt, we demonstrated shortened RNA fragmentation time, which resulted in a dramatic shift of insert size distribution. To evaluate products of multiple de novo assembly runs incorporating reads with different RNA sources, read lengths, and insert sizes, we introduce a new reference gene set, core vertebrate genes (CVG), consisting of 233 genes that are shared as one-to-one orthologs by all vertebrate genomes examined (29 species)., The completeness assessment performed by the computational pipelines CEGMA and BUSCO referring to CVG, demonstrated higher accuracy and resolution than with the gene set previously established for this purpose. As a result of the assessment with CVG, we have derived the most comprehensive transcript sequence set of the Madagascar ground gecko by means of assembling individual libraries followed by clustering the assembled sequences based on their overall similarities. Conclusion: Our results provide several insights into optimizing de novo RNA-seq workflow, including the coordination between library insert size and read length, which manifested in improved connectivity of assemblies. The approach and assembly assessment with CVG demonstrated here would be applicable to transcriptome analysis of other species as well as whole genome analyses.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation
    Yuichiro Hara
    Kaori Tatsumi
    Michio Yoshida
    Eriko Kajikawa
    Hiroshi Kiyonari
    Shigehiro Kuraku
    BMC Genomics, 16
  • [2] De novo assembly of transcriptome from next-generation sequencing data
    Xuan Li
    Yimeng Kong
    QiongYi Zhao
    YuanYuan Li
    Pei Hao
    Quantitative Biology, 2016, 4 (02) : 94 - 105
  • [3] De novo sequencing and assembly of Azadirachta indica fruit transcriptome
    Krishnan, Neeraja M.
    Pattnaik, Swetansu
    Deepak, S. A.
    Hariharan, Arun K.
    Gaur, Prakhar
    Chaudhary, Rakshit
    Jain, Prachi
    Vaidyanathan, Srividya
    Krishna, P. G. Bharath
    Panda, Binay
    CURRENT SCIENCE, 2011, 101 (12): : 1553 - 1561
  • [4] Sequencing and de novo assembly of a Dahlia hybrid cultivar transcriptome
    Lehnert, Erik M.
    Walbot, Virginia
    FRONTIERS IN PLANT SCIENCE, 2014, 5
  • [5] SEQUENCING AND DE NOVO TRANSCRIPTOME ASSEMBLY OF BRACHYPODIUM SYLVATICUM (POACEAE)
    Fox, Samuel E.
    Preece, Justin
    Kimbrel, Jeffrey A.
    Marchini, Gina L.
    Sage, Abigail
    Youens-Clark, Ken
    Cruzan, Mitchell B.
    Jaiswal, Pankaj
    APPLICATIONS IN PLANT SCIENCES, 2013, 1 (03):
  • [6] De Novo Sequencing and Assembly Analysis of the Pseudostellaria heterophylla Transcriptome
    Li, Jun
    Zhen, Wei
    Long, Dengkai
    Ding, Ling
    Gong, Anhui
    Xiao, Chenghong
    Jiang, Weike
    Liu, Xiaoqing
    Zhou, Tao
    Huang, Luqi
    PLOS ONE, 2016, 11 (10):
  • [7] Optimization of de novo transcriptome assembly from next-generation sequencing data
    Surget-Groba, Yann
    Montoya-Burgos, Juan I.
    GENOME RESEARCH, 2010, 20 (10) : 1432 - 1440
  • [8] De novo transcriptome sequencing and assembly from apomictic and sexual Eragrostis curvula genotypes
    Garbus, Ingrid
    Rodolfo Romero, Jose
    Pablo Selva, Juan
    Cielo Pasten, Maria
    Chinestra, Carolina
    Carballo, Jose
    Carlos Zappacosta, Diego
    Echenique, Viviana
    PLOS ONE, 2017, 12 (11):
  • [9] Sequencing, de novo assembly and comparative analysis of Raphanus sativus transcriptome
    Wu, Gang
    Zhang, Libin
    Yin, Yongtai
    Wu, Jiangsheng
    Yu, Longjiang
    Zhou, Yanhong
    Li, Maoteng
    FRONTIERS IN PLANT SCIENCE, 2015, 6
  • [10] Sequencing and De Novo Assembly of the Toxicodendron radicans (Poison Ivy) Transcriptome
    Weisberg, Alexandra J.
    Kim, Gunjune
    Westwood, James H.
    Jelesko, John G.
    GENES, 2017, 8 (11)