Optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation

被引:61
|
作者
Hara, Yuichiro [1 ]
Tatsumi, Kaori [1 ]
Yoshida, Michio [2 ]
Kajikawa, Eriko [2 ]
Kiyonari, Hiroshi [3 ,4 ]
Kuraku, Shigehiro [1 ]
机构
[1] RIKEN Ctr Life Sci Technol, Phyloinformat Unit, Chuo Ku, Kobe, Hyogo 6500047, Japan
[2] RIKEN Ctr Dev Biol, Lab Vertebrate Body Plan, Chuo Ku, Kobe, Hyogo 6500047, Japan
[3] RIKEN Ctr Life Sci Technol, Anim Resource Dev Unit, Chuo Ku, Kobe, Hyogo 6500047, Japan
[4] RIKEN Ctr Life Sci Technol, Genet Engn Team, Chuo Ku, Kobe, Hyogo 6500047, Japan
来源
BMC GENOMICS | 2015年 / 16卷
关键词
RNA-seq; Transcriptome sequencing; de novo assembly; Completeness assessment; Library insert length; CVG (core vertebrate genes); Madagascar ground gecko; RNA-SEQ DATA; GENE-EXPRESSION; GENERATION; ALIGNMENT; MODEL; RECONSTRUCTION; OPTIMIZATION; GENOME; COFFEE; TREES;
D O I
10.1186/s12864-015-2007-1
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: RNA-seq enables gene expression profiling in selected spatiotemporal windows and yields massive sequence information with relatively low cost and time investment, even for non-model species. However, there remains a large room for optimizing its workflow, in order to take full advantage of continuously developing sequencing capacity. Method: Transcriptome sequencing for three embryonic stages of Madagascar ground gecko (Paroedura picta) was performed with the Illumina platform. The output reads were assembled de novo for reconstructing transcript sequences. In order to evaluate the completeness of transcriptome assemblies, we prepared a reference gene set consisting of vertebrate one-to-one orthologs. Result: To take advantage of increased read length of >150 nt, we demonstrated shortened RNA fragmentation time, which resulted in a dramatic shift of insert size distribution. To evaluate products of multiple de novo assembly runs incorporating reads with different RNA sources, read lengths, and insert sizes, we introduce a new reference gene set, core vertebrate genes (CVG), consisting of 233 genes that are shared as one-to-one orthologs by all vertebrate genomes examined (29 species)., The completeness assessment performed by the computational pipelines CEGMA and BUSCO referring to CVG, demonstrated higher accuracy and resolution than with the gene set previously established for this purpose. As a result of the assessment with CVG, we have derived the most comprehensive transcript sequence set of the Madagascar ground gecko by means of assembling individual libraries followed by clustering the assembled sequences based on their overall similarities. Conclusion: Our results provide several insights into optimizing de novo RNA-seq workflow, including the coordination between library insert size and read length, which manifested in improved connectivity of assemblies. The approach and assembly assessment with CVG demonstrated here would be applicable to transcriptome analysis of other species as well as whole genome analyses.
引用
收藏
页数:12
相关论文
共 50 条
  • [11] De Novo Sequencing and Assembly Analysis of the Pseudostellaria heterophylla Transcriptome
    Li, Jun
    Zhen, Wei
    Long, Dengkai
    Ding, Ling
    Gong, Anhui
    Xiao, Chenghong
    Jiang, Weike
    Liu, Xiaoqing
    Zhou, Tao
    Huang, Luqi
    PLOS ONE, 2016, 11 (10):
  • [12] SEQUENCING AND DE NOVO TRANSCRIPTOME ASSEMBLY OF BRACHYPODIUM SYLVATICUM (POACEAE)
    Fox, Samuel E.
    Preece, Justin
    Kimbrel, Jeffrey A.
    Marchini, Gina L.
    Sage, Abigail
    Youens-Clark, Ken
    Cruzan, Mitchell B.
    Jaiswal, Pankaj
    APPLICATIONS IN PLANT SCIENCES, 2013, 1 (03):
  • [13] Illumina-based de novo transcriptome sequencing and analysis of Chinese forest musk deer
    Xu, Zhongxian
    Jie, Hang
    Chen, Binlong
    Gaur, Uma
    Wu, Nan
    Gao, Jian
    Li, Pinming
    Zhao, Guijun
    Zeng, Dejun
    Yang, Mingyao
    Li, Diyan
    JOURNAL OF GENETICS, 2017, 96 (06) : 1033 - 1040
  • [14] De novo transcriptome sequencing and assembly from apomictic and sexual Eragrostis curvula genotypes
    Garbus, Ingrid
    Rodolfo Romero, Jose
    Pablo Selva, Juan
    Cielo Pasten, Maria
    Chinestra, Carolina
    Carballo, Jose
    Carlos Zappacosta, Diego
    Echenique, Viviana
    PLOS ONE, 2017, 12 (11):
  • [15] Sequencing, de novo assembly and comparative analysis of Raphanus sativus transcriptome
    Wu, Gang
    Zhang, Libin
    Yin, Yongtai
    Wu, Jiangsheng
    Yu, Longjiang
    Zhou, Yanhong
    Li, Maoteng
    FRONTIERS IN PLANT SCIENCE, 2015, 6
  • [16] De novo assembly and validation of planaria transcriptome by massive parallel sequencing and shotgun proteomics
    Adamidi, Catherine
    Wang, Yongbo
    Gruen, Dominic
    Mastrobuoni, Guido
    You, Xintian
    Tolle, Dominic
    Dodt, Matthias
    Mackowiak, Sebastian D.
    Gogol-Doering, Andreas
    Oenal, Pinar
    Rybak, Agnieszka
    Ross, Eric
    Alvarado, Alejandro Sanchez
    Kempa, Stefan
    Dieterich, Christoph
    Rajewsky, Nikolaus
    Chen, Wei
    GENOME RESEARCH, 2011, 21 (07) : 1193 - 1200
  • [17] Sequencing and De Novo Assembly of the Toxicodendron radicans (Poison Ivy) Transcriptome
    Weisberg, Alexandra J.
    Kim, Gunjune
    Westwood, James H.
    Jelesko, John G.
    GENES, 2017, 8 (11)
  • [18] Sequencing, de novo assembly and annotation of Digitalis ferruginea subsp. schischkinii transcriptome
    Ercan Selçuk Ünlü
    Özge Kaya
    İsmail Eker
    Ekrem Gürel
    Molecular Biology Reports, 2021, 48 : 127 - 137
  • [19] Characterization of Common Carp Transcriptome: Sequencing, De Novo Assembly, Annotation and Comparative Genomics
    Ji, Peifeng
    Liu, Guiming
    Xu, Jian
    Wang, Xumin
    Li, Jiongtang
    Zhao, Zixia
    Zhang, Xiaofeng
    Zhang, Yan
    Xu, Peng
    Sun, Xiaowen
    PLOS ONE, 2012, 7 (04):
  • [20] Comparative analysis of de novo transcriptome assembly
    Clarke, Kaitlin
    Yang Yi
    Marsh, Ronald
    Xie LingLin
    Zhang, Ke K.
    SCIENCE CHINA-LIFE SCIENCES, 2013, 56 (02) : 156 - 162