Optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation

被引:61
|
作者
Hara, Yuichiro [1 ]
Tatsumi, Kaori [1 ]
Yoshida, Michio [2 ]
Kajikawa, Eriko [2 ]
Kiyonari, Hiroshi [3 ,4 ]
Kuraku, Shigehiro [1 ]
机构
[1] RIKEN Ctr Life Sci Technol, Phyloinformat Unit, Chuo Ku, Kobe, Hyogo 6500047, Japan
[2] RIKEN Ctr Dev Biol, Lab Vertebrate Body Plan, Chuo Ku, Kobe, Hyogo 6500047, Japan
[3] RIKEN Ctr Life Sci Technol, Anim Resource Dev Unit, Chuo Ku, Kobe, Hyogo 6500047, Japan
[4] RIKEN Ctr Life Sci Technol, Genet Engn Team, Chuo Ku, Kobe, Hyogo 6500047, Japan
来源
BMC GENOMICS | 2015年 / 16卷
关键词
RNA-seq; Transcriptome sequencing; de novo assembly; Completeness assessment; Library insert length; CVG (core vertebrate genes); Madagascar ground gecko; RNA-SEQ DATA; GENE-EXPRESSION; GENERATION; ALIGNMENT; MODEL; RECONSTRUCTION; OPTIMIZATION; GENOME; COFFEE; TREES;
D O I
10.1186/s12864-015-2007-1
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: RNA-seq enables gene expression profiling in selected spatiotemporal windows and yields massive sequence information with relatively low cost and time investment, even for non-model species. However, there remains a large room for optimizing its workflow, in order to take full advantage of continuously developing sequencing capacity. Method: Transcriptome sequencing for three embryonic stages of Madagascar ground gecko (Paroedura picta) was performed with the Illumina platform. The output reads were assembled de novo for reconstructing transcript sequences. In order to evaluate the completeness of transcriptome assemblies, we prepared a reference gene set consisting of vertebrate one-to-one orthologs. Result: To take advantage of increased read length of >150 nt, we demonstrated shortened RNA fragmentation time, which resulted in a dramatic shift of insert size distribution. To evaluate products of multiple de novo assembly runs incorporating reads with different RNA sources, read lengths, and insert sizes, we introduce a new reference gene set, core vertebrate genes (CVG), consisting of 233 genes that are shared as one-to-one orthologs by all vertebrate genomes examined (29 species)., The completeness assessment performed by the computational pipelines CEGMA and BUSCO referring to CVG, demonstrated higher accuracy and resolution than with the gene set previously established for this purpose. As a result of the assessment with CVG, we have derived the most comprehensive transcript sequence set of the Madagascar ground gecko by means of assembling individual libraries followed by clustering the assembled sequences based on their overall similarities. Conclusion: Our results provide several insights into optimizing de novo RNA-seq workflow, including the coordination between library insert size and read length, which manifested in improved connectivity of assemblies. The approach and assembly assessment with CVG demonstrated here would be applicable to transcriptome analysis of other species as well as whole genome analyses.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] De novo assembly and annotation of the whole transcriptome of Oratosquilla oratoria
    Lou, Fangrui
    Gao, Tianxiang
    Cai, Shanshan
    Han, Zhiqiang
    MARINE GENOMICS, 2018, 38 : 17 - 20
  • [22] De novo assembly of kenaf (Hibiscus cannabinus) transcriptome using Illumina sequencing for gene discovery and marker identification
    Zhang, Liwu
    Wan, Xuebei
    Xu, Jiantang
    Lin, Lihui
    Qi, Jianmin
    MOLECULAR BREEDING, 2015, 35 (10)
  • [23] Enhancement of de novo sequencing, assembly and annotation of the Mongolian gerbil genome with transcriptome sequencing and assembly from several different tissues
    Cheng, Shifeng
    Fu, Yuan
    Zhang, Yaolei
    Xian, Wenfei
    Wang, Hongli
    Grothe, Benedikt
    Liu, Xin
    Xu, Xun
    Klug, Achim
    McCullagh, Elizabeth A.
    BMC GENOMICS, 2019, 20 (01)
  • [24] Sequencing, de novo assembly, annotation and SSR and SNP detection of sabaigrass (Eulaliopsis binata) transcriptome
    Zou, Dian
    Chen, Xinbo
    Zou, Dongsheng
    GENOMICS, 2013, 102 (01) : 57 - 62
  • [25] Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study
    Zhao, Qiong-Yi
    Wang, Yi
    Kong, Yi-Meng
    Luo, Da
    Li, Xuan
    Hao, Pei
    BMC BIOINFORMATICS, 2011, 12 : S2
  • [26] De Novo Transcriptome Assembly of the Chinese Swamp Buffalo by RNA Sequencing and SSR Marker Discovery
    Deng, Tingxian
    Pang, Chunying
    Lu, Xingrong
    Zhu, Peng
    Duan, Anqin
    Tan, Zhengzhun
    Huang, Jian
    Li, Hui
    Chen, Mingtan
    Liang, Xianwei
    PLOS ONE, 2016, 11 (01):
  • [27] Transcriptome Sequencing and De Novo Assembly of Golden Cuttlefish Sepia esculenta Hoyle
    Liu, Changlin
    Zhao, Fazhen
    Yan, Jingping
    Liu, Chunsheng
    Liu, Siwei
    Chen, Siqing
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2016, 17 (10)
  • [28] Sequencing, de novo assembly and annotation of a pink bollworm larval midgut transcriptome
    Tassone, Erica E.
    Zastrow-Hayes, Gina
    Mathis, John
    Nelson, Mark E.
    Wu, Gusui
    Flexner, J. Lindsey
    Carriere, Yves
    Tabashnik, Bruce E.
    Fabrick, Jeffrey A.
    GIGASCIENCE, 2016, 5
  • [29] De novo assembly and annotation of the Avicennia officinalis L. transcriptome
    Lyu, Haomin
    Li, Xinnian
    Guo, Zixiao
    He, Ziwen
    Shi, Suhua
    MARINE GENOMICS, 2018, 39 : 3 - 6
  • [30] Parallelization of the Trinity pipeline for de novo transcriptome assembly
    Sachdeva, V.
    Kim, C. S.
    Jordan, K. E.
    Winn, M. D.
    PROCEEDINGS OF 2014 IEEE INTERNATIONAL PARALLEL & DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2014, : 567 - 576