BiosyntheticSPAdes: reconstructing biosynthetic gene clusters from assembly graphs

被引:42
作者
Meleshko, Dmitry [1 ,2 ]
Mohimani, Hosein [3 ,4 ]
Tracanna, Vittorio [5 ]
Hajirasouliha, Iman [6 ,7 ]
Medema, Marnix H. [5 ]
Korobeynikov, Anton [1 ,8 ]
Pevzner, Pavel A. [1 ,3 ]
机构
[1] St Petersburg State Univ, Inst Translat Biomed, Ctr Algorithm Biotechnol, St Petersburg 19904, Russia
[2] Weill Cornell Med Coll, Triinst PhD Program Computat Biol & Med, New York, NY 10021 USA
[3] Univ Calif San Diego, Dept Comp Sci & Engn, San Diego, CA 92093 USA
[4] Carnegie Mellon Univ, Sch Comp Sci, Computat Biol Dept, Pittsburgh, PA 15213 USA
[5] Wageningen Univ, Bioinformat Grp, NL-6708 PB Wageningen, Netherlands
[6] Cornell Univ, Weill Cornell Med, Dept Physiol & Biophys, Inst Computat Biomed, New York, NY 10021 USA
[7] Weill Cornell Med, Meyer Canc Ctr, Englander Inst Precis Med, New York, NY 10021 USA
[8] St Petersburg State Univ, Dept Stat Modelling, St Petersburg 198504, Russia
基金
美国国家卫生研究院; 美国安德鲁·梅隆基金会; 俄罗斯科学基金会;
关键词
PEPTIDIC NATURAL-PRODUCTS; COMPLETE GENOME SEQUENCE; MASS-SPECTROMETRY; DATABASE SEARCH; BACTERIAL; DEREPLICATION; PREDICTION; PARALLEL; REVEALS; MODEL;
D O I
10.1101/gr.243477.118
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Predicting biosynthetic gene clusters (BGCs) is critically important for discovery of antibiotics and other natural products. While BGC prediction from complete genomes is a well-studied problem, predicting BGCs in fragmented genomic assemblies remains challenging. The existing BGC prediction tools often assume that each BGC is encoded within a single contig in the genome assembly, a condition that is violated for most sequenced microbial genomes where BGCs are often scattered through several contigs, making it difficult to reconstruct them. The situation is even more severe in shotgun metagenomics, where the contigs are often short, and the existing tools fail to predict a large fraction of long BGCs. While it is difficult to assemble BGCs in a single contig, the structure of the genome assembly graph often provides clues on how to combine multiple contigs into segments encoding long BGCs. We describe biosyntheticSPAdes, a tool for predicting BGCs in assembly graphs and demonstrate that it greatly improves the reconstruction of BGCs from genomic and metagenomics data sets.
引用
收藏
页码:1352 / 1362
页数:11
相关论文
共 51 条
[1]   SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing [J].
Bankevich, Anton ;
Nurk, Sergey ;
Antipov, Dmitry ;
Gurevich, Alexey A. ;
Dvorkin, Mikhail ;
Kulikov, Alexander S. ;
Lesin, Valery M. ;
Nikolenko, Sergey I. ;
Son Pham ;
Prjibelski, Andrey D. ;
Pyshkin, Alexey V. ;
Sirotkin, Alexander V. ;
Vyahhi, Nikolay ;
Tesler, Glenn ;
Alekseyev, Max A. ;
Pevzner, Pavel A. .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (05) :455-477
[2]   Complete genome sequence of the model actinomycete Streptomyces coelicolor A3(2) [J].
Bentley, SD ;
Chater, KF ;
Cerdeño-Tárraga, AM ;
Challis, GL ;
Thomson, NR ;
James, KD ;
Harris, DE ;
Quail, MA ;
Kieser, H ;
Harper, D ;
Bateman, A ;
Brown, S ;
Chandra, G ;
Chen, CW ;
Collins, M ;
Cronin, A ;
Fraser, A ;
Goble, A ;
Hidalgo, J ;
Hornsby, T ;
Howarth, S ;
Huang, CH ;
Kieser, T ;
Larke, L ;
Murphy, L ;
Oliver, K ;
O'Neil, S ;
Rabbinowitsch, E ;
Rajandream, MA ;
Rutherford, K ;
Rutter, S ;
Seeger, K ;
Saunders, D ;
Sharp, S ;
Squares, R ;
Squares, S ;
Taylor, K ;
Warren, T ;
Wietzorrek, A ;
Woodward, J ;
Barrell, BG ;
Parkhill, J ;
Hopwood, DA .
NATURE, 2002, 417 (6885) :141-147
[3]   GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses [J].
Besemer, J ;
Borodovsky, M .
NUCLEIC ACIDS RESEARCH, 2005, 33 :W451-W454
[4]   The antiSMASH database, a comprehensive database of microbial secondary metabolite biosynthetic gene clusters [J].
Blin, Kai ;
Medema, Marnix H. ;
Kottmann, Renzo ;
Lee, Sang Yup ;
Weber, Tilmann .
NUCLEIC ACIDS RESEARCH, 2017, 45 (D1) :D555-D559
[5]   The parallel and convergent universes of polyketide synthases and nonribosomal peptide synthetases [J].
Cane, DE ;
Walsh, CT .
CHEMISTRY & BIOLOGY, 1999, 6 (12) :R319-R325
[6]   Coelichelin, a new peptide siderophore encoded by the Streptomyces coelicolor genome:: structure prediction from the sequence of its non-ribosomal peptide synthetase [J].
Challis, GL ;
Ravel, J .
FEMS MICROBIOLOGY LETTERS, 2000, 187 (02) :111-114
[7]   Characterization of Cyanobacterial Hydrocarbon Composition and Distribution of Biosynthetic Pathways [J].
Coates, R. Cameron ;
Podell, Sheila ;
Korobeynikov, Anton ;
Lapidus, Alla ;
Pevzner, Pavel ;
Sherman, David H. ;
Allen, Eric E. ;
Gerwick, Lena ;
Gerwick, William H. .
PLOS ONE, 2014, 9 (01)
[8]   How to apply de Bruijn graphs to genome assembly [J].
Compeau, Phillip E. C. ;
Pevzner, Pavel A. ;
Tesler, Glenn .
NATURE BIOTECHNOLOGY, 2011, 29 (11) :987-991
[9]   Natural products: A continuing source of novel drug leads [J].
Cragg, Gordon M. ;
Newman, David J. .
BIOCHIMICA ET BIOPHYSICA ACTA-GENERAL SUBJECTS, 2013, 1830 (06) :3670-3695
[10]   Identifying bacterial genes and endosymbiont DNA with Glimmer [J].
Delcher, Arthur L. ;
Bratke, Kirsten A. ;
Powers, Edwin C. ;
Salzberg, Steven L. .
BIOINFORMATICS, 2007, 23 (06) :673-679