Robust high-throughput prokaryote de novo assembly and improvement pipeline for Illumina data

被引:178
|
作者
Page, Andrew J. [1 ]
De Silva, Nishadi [1 ]
Hunt, Martin [1 ]
Quail, Michael A. [2 ]
Parkhill, Julian [3 ]
Harris, Simon R. [3 ]
Otto, Thomas D. [4 ]
Keane, Jacqueline A. [1 ]
机构
[1] Wellcome Trust Sanger Inst, Pathogen Informat, Wellcome Genome Campus, Hinxton CB10 1SA, Cambs, England
[2] Wellcome Trust Sanger Inst, Biochem Dev, Wellcome Genome Campus, Hinxton CB10 1SA, Cambs, England
[3] Wellcome Trust Sanger Inst, Pathogen Genom, Wellcome Genome Campus, Hinxton CB10 1SA, Cambs, England
[4] Wellcome Trust Sanger Inst, Parasite Genom, Wellcome Genome Campus, Hinxton CB10 1SA, Cambs, England
来源
MICROBIAL GENOMICS | 2016年 / 2卷 / 08期
基金
英国惠康基金;
关键词
illumina; assembly; high-throughput; prokaryotic; GENOME SEQUENCE; ALGORITHM; EVOLUTION;
D O I
10.1099/mgen.0.000083
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The rapidly reducing cost of bacterial genome sequencing has lead to its routine use in large-scale microbial analysis. Though mapping approaches can be used to find differences relative to the reference, many bacteria are subject to constant evolutionary pressures resulting in events such as the loss and gain of mobile genetic elements, horizontal gene transfer through recombination and genomic rearrangements. De novo assembly is the reconstruction of the underlying genome sequence, an essential step to understanding bacterial genome diversity. Here we present a high-throughput bacterial assembly and improvement pipeline that has been used to generate nearly 20 000 annotated draft genome assemblies in public databases. We demonstrate its performance on a public data set of 9404 genomes. We find all the genes used in multi-locus sequence typing schema present in 99.6 % of assembled genomes. When tested on low-,neutral-and high-GC organisms, more than 94 % of genes were present and completely intact. The pipeline has been proven to be scalable and robust with a wide variety of datasets without requiring human intervention. All of the software is available on GitHub under the GNU GPL open source license.
引用
收藏
页数:7
相关论文
共 50 条
  • [21] A high-throughput pipeline for validation of antibodies
    Krzysztof Sikorski
    Adi Mehta
    Marit Inngjerdingen
    Flourina Thakor
    Simon Kling
    Tomas Kalina
    Tuula A. Nyman
    Maria Ekman Stensland
    Wei Zhou
    Gustavo A. de Souza
    Lars Holden
    Jan Stuchly
    Markus Templin
    Fridtjof Lund-Johansen
    Nature Methods, 2018, 15 : 909 - 912
  • [22] High-throughput sample handling and data collection at synchrotrons: embedding the ESRF into the high-throughput gene-to-structure pipeline
    Beteva, A.
    Cipriani, F.
    Cusack, S.
    Delageniere, S.
    Gabadinho, J.
    Gordon, E. J.
    Guijarro, M.
    Hall, D. R.
    Larsen, S.
    Launer, L.
    Lavault, C. B.
    Leonard, G. A.
    Mairs, T.
    McCarthy, A.
    McCarthy, J.
    Meyer, J.
    Mitchell, E.
    Monaco, S.
    Nurizzo, D.
    Pernot, P.
    Pieritz, R.
    Ravelli, R. G. B.
    Rey, V.
    Shepard, W.
    Spruce, D.
    Stuart, D. I.
    Svensson, O.
    Theveneau, P.
    Thibault, X.
    Turkenburg, J.
    Walsh, M.
    McSweeney, S. M.
    ACTA CRYSTALLOGRAPHICA SECTION D-STRUCTURAL BIOLOGY, 2006, 62 : 1162 - 1169
  • [23] A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis
    Dillies, Marie-Agnes
    Rau, Andrea
    Aubert, Julie
    Hennequet-Antier, Christelle
    Jeanmougin, Marine
    Servant, Nicolas
    Keime, Celine
    Marot, Guillemette
    Castel, David
    Estelle, Jordi
    Guernec, Gregory
    Jagla, Bernd
    Jouneau, Luc
    Laloe, Denis
    Le Gall, Caroline
    Schaeffer, Brigitte
    Le Crom, Stephane
    Guedj, Mickael
    Jaffrezic, Florence
    BRIEFINGS IN BIOINFORMATICS, 2013, 14 (06) : 671 - 683
  • [24] High-throughput sequencing and de novo transcriptome assembly of Swertia japonica to identify genes involved in the biosynthesis of therapeutic metabolites
    Rai, Amit
    Nakamura, Michimi
    Takahashi, Hiroki
    Suzuki, Hideyuki
    Saito, Kazuki
    Yamazaki, Mami
    PLANT CELL REPORTS, 2016, 35 (10) : 2091 - 2111
  • [25] High-throughput sequencing and de novo transcriptome assembly of Swertia japonica to identify genes involved in the biosynthesis of therapeutic metabolites
    Amit Rai
    Michimi Nakamura
    Hiroki Takahashi
    Hideyuki Suzuki
    Kazuki Saito
    Mami Yamazaki
    Plant Cell Reports, 2016, 35 : 2091 - 2111
  • [26] Setting up a discovery pipeline in KNIME and PipelinePilot: High-throughput de novo design utilizing gigantic virtual chemistry spaces
    Detering, Carsten
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2013, 245
  • [27] High-Throughput Sequencing and De Novo Assembly of Brassica oleracea var. Capitata L. for Transcriptome Analysis
    Kim, Hyun A.
    Lim, Chan Ju
    Kim, Sangmi
    Choe, Jun Kyoung
    Jo, Sung-Hwan
    Baek, Namkwon
    Kwon, Suk-Yoon
    PLOS ONE, 2014, 9 (03):
  • [28] De novo assembly and annotation of the CHOZN® GS-/- genome supports high-throughput genome-scale screening
    Kretzmer, Corey
    Narasimhan, Rajagopalan Lakshmi
    Lal, Rahul Deva
    Balassi, Vincent
    Ravellette, James
    Manjunath, Ajaya Kumar Kotekar
    Koshy, Jesvin Joy
    Viano, Marta
    Torre, Serena
    Zanda, Valeria M.
    Kumravat, Mausam
    Saldanha, Keith Metelo Raul
    Chandranpillai, Harikrishnan
    Nihad, Ifra
    Zhong, Fei
    Sun, Yi
    Gustin, Jason
    Borgschulte, Trissa
    Liu, Jiajian
    Razafsky, David
    BIOTECHNOLOGY AND BIOENGINEERING, 2022, 119 (12) : 3632 - 3646
  • [29] High-Throughput Sequencing and De Novo Assembly of Red and Green Forms of the Perilla frutescens var. crispa Transcriptome
    Fukushima, Atsushi
    Nakamura, Michimi
    Suzuki, Hideyuki
    Saito, Kazuki
    Yamazaki, Mami
    PLOS ONE, 2015, 10 (06):
  • [30] PhyloHerb: A high-throughput phylogenomic pipeline for processing genome skimming data
    Cai, Liming
    Zhang, Hongrui
    Davis, Charles C.
    APPLICATIONS IN PLANT SCIENCES, 2022, 10 (03):