Optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation

被引:61
|
作者
Hara, Yuichiro [1 ]
Tatsumi, Kaori [1 ]
Yoshida, Michio [2 ]
Kajikawa, Eriko [2 ]
Kiyonari, Hiroshi [3 ,4 ]
Kuraku, Shigehiro [1 ]
机构
[1] RIKEN Ctr Life Sci Technol, Phyloinformat Unit, Chuo Ku, Kobe, Hyogo 6500047, Japan
[2] RIKEN Ctr Dev Biol, Lab Vertebrate Body Plan, Chuo Ku, Kobe, Hyogo 6500047, Japan
[3] RIKEN Ctr Life Sci Technol, Anim Resource Dev Unit, Chuo Ku, Kobe, Hyogo 6500047, Japan
[4] RIKEN Ctr Life Sci Technol, Genet Engn Team, Chuo Ku, Kobe, Hyogo 6500047, Japan
来源
BMC GENOMICS | 2015年 / 16卷
关键词
RNA-seq; Transcriptome sequencing; de novo assembly; Completeness assessment; Library insert length; CVG (core vertebrate genes); Madagascar ground gecko; RNA-SEQ DATA; GENE-EXPRESSION; GENERATION; ALIGNMENT; MODEL; RECONSTRUCTION; OPTIMIZATION; GENOME; COFFEE; TREES;
D O I
10.1186/s12864-015-2007-1
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: RNA-seq enables gene expression profiling in selected spatiotemporal windows and yields massive sequence information with relatively low cost and time investment, even for non-model species. However, there remains a large room for optimizing its workflow, in order to take full advantage of continuously developing sequencing capacity. Method: Transcriptome sequencing for three embryonic stages of Madagascar ground gecko (Paroedura picta) was performed with the Illumina platform. The output reads were assembled de novo for reconstructing transcript sequences. In order to evaluate the completeness of transcriptome assemblies, we prepared a reference gene set consisting of vertebrate one-to-one orthologs. Result: To take advantage of increased read length of >150 nt, we demonstrated shortened RNA fragmentation time, which resulted in a dramatic shift of insert size distribution. To evaluate products of multiple de novo assembly runs incorporating reads with different RNA sources, read lengths, and insert sizes, we introduce a new reference gene set, core vertebrate genes (CVG), consisting of 233 genes that are shared as one-to-one orthologs by all vertebrate genomes examined (29 species)., The completeness assessment performed by the computational pipelines CEGMA and BUSCO referring to CVG, demonstrated higher accuracy and resolution than with the gene set previously established for this purpose. As a result of the assessment with CVG, we have derived the most comprehensive transcript sequence set of the Madagascar ground gecko by means of assembling individual libraries followed by clustering the assembled sequences based on their overall similarities. Conclusion: Our results provide several insights into optimizing de novo RNA-seq workflow, including the coordination between library insert size and read length, which manifested in improved connectivity of assemblies. The approach and assembly assessment with CVG demonstrated here would be applicable to transcriptome analysis of other species as well as whole genome analyses.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Effect of de novo transcriptome assembly on transcript quantification
    Hsieh, Ping-Han
    Oyang, Yen-Jen
    Chen, Chien-Yu
    SCIENTIFIC REPORTS, 2019, 9 (1)
  • [32] Informed kmer selection for de novo transcriptome assembly
    Durai, Dilip A.
    Schulz, Marcel H.
    BIOINFORMATICS, 2016, 32 (11) : 1670 - 1677
  • [33] De novo Assembly and Annotation of the Antarctic Alga Prasiola crispa Transcriptome
    Carvalho, Evelise L.
    Maciel, Lucas F.
    Macedo, Pablo E.
    Dezordi, Filipe Z.
    Abreu, Maria E. T.
    Victoria, Filipe de Carvalho
    Pereira, Antonio B.
    Boldo, Juliano T.
    Wallau, Gabriel da Luz
    Pinto, Paulo M.
    FRONTIERS IN MOLECULAR BIOSCIENCES, 2018, 4
  • [34] IsoTree: A New Framework for de novo Transcriptome Assembly from RNA-seq Reads
    Zhao, Jin
    Feng, Haodi
    Zhu, Daming
    Zhang, Chi
    Xu, Ying
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2020, 17 (03) : 938 - 948
  • [35] De novo assembly and characterisation of the transcriptome of the Beringian pseudoscorpion
    Lebenzon, Jacqueline E.
    Toxopeus, Jantina
    Anthony, Susan E.
    Sinclair, Brent J.
    CANADIAN ENTOMOLOGIST, 2021, 153 (03) : 301 - 313
  • [36] Characterisation of Faba Bean (Vicia faba L.) Transcriptome Using RNA-Seq: Sequencing, De Novo Assembly, Annotation, and Expression Analysis
    Braich, Shivraj
    Sudheesh, Shimna
    Forster, John W.
    Kaur, Sukhjiwan
    AGRONOMY-BASEL, 2017, 7 (03):
  • [37] Transcriptome sequencing and de novo assembly in red raspberry fruit development to elucidates the secondary metabolite pathways
    Kang, Xiaojun
    Li, Wenxin
    Zhang, Xuemei
    Tang, Yiwei
    Zhao, Zhilei
    Gu, Yuhong
    Qi, Guohui
    Guo, Suping
    JOURNAL OF BERRY RESEARCH, 2020, 10 (03) : 497 - 511
  • [38] Analysis of de novo sequencing and transcriptome assembly and lignocellulolytic enzymes gene expression of Coriolopsis gallica HTC
    Chen, Yuehong
    Cao, Qinghua
    Tao, Xiang
    Shao, Huanhuan
    Zhang, Kun
    Zhang, Yizheng
    Tan, Xuemei
    BIOSCIENCE BIOTECHNOLOGY AND BIOCHEMISTRY, 2017, 81 (03) : 460 - 468
  • [39] De novo assembly and characterization of the Trichuris trichiura adult worm transcriptome using Ion Torrent sequencing
    Santos, Leonardo N.
    Silva, Eduardo S.
    Santos, Andre S.
    De Sa, Pablo H.
    Ramos, Rommel T.
    Silva, Artur
    Cooper, Philip J.
    Barreto, Mauricio L.
    Loureiro, Sebastiao
    Pinheiro, Carina S.
    Alcantara-Neves, Neuza M.
    Pacheco, Luis G. C.
    ACTA TROPICA, 2016, 159 : 132 - 141
  • [40] Transcriptome de novo assembly sequencing and analysis of the toxic dinoflagellate Alexandrium catenella using the Illumina platform
    Zhang, Shu
    Sui, Zhenghong
    Chang, Lianpeng
    Kang, KyoungHo
    Ma, Jinhua
    Kong, Farina
    Zhou, Wei
    Wang, Jinguo
    Guo, Liliang
    Geng, Huili
    Zhong, Jie
    Ma, Qingxia
    GENE, 2014, 537 (02) : 285 - 293