BiosyntheticSPAdes: reconstructing biosynthetic gene clusters from assembly graphs

被引:42
作者
Meleshko, Dmitry [1 ,2 ]
Mohimani, Hosein [3 ,4 ]
Tracanna, Vittorio [5 ]
Hajirasouliha, Iman [6 ,7 ]
Medema, Marnix H. [5 ]
Korobeynikov, Anton [1 ,8 ]
Pevzner, Pavel A. [1 ,3 ]
机构
[1] St Petersburg State Univ, Inst Translat Biomed, Ctr Algorithm Biotechnol, St Petersburg 19904, Russia
[2] Weill Cornell Med Coll, Triinst PhD Program Computat Biol & Med, New York, NY 10021 USA
[3] Univ Calif San Diego, Dept Comp Sci & Engn, San Diego, CA 92093 USA
[4] Carnegie Mellon Univ, Sch Comp Sci, Computat Biol Dept, Pittsburgh, PA 15213 USA
[5] Wageningen Univ, Bioinformat Grp, NL-6708 PB Wageningen, Netherlands
[6] Cornell Univ, Weill Cornell Med, Dept Physiol & Biophys, Inst Computat Biomed, New York, NY 10021 USA
[7] Weill Cornell Med, Meyer Canc Ctr, Englander Inst Precis Med, New York, NY 10021 USA
[8] St Petersburg State Univ, Dept Stat Modelling, St Petersburg 198504, Russia
基金
美国国家卫生研究院; 美国安德鲁·梅隆基金会; 俄罗斯科学基金会;
关键词
PEPTIDIC NATURAL-PRODUCTS; COMPLETE GENOME SEQUENCE; MASS-SPECTROMETRY; DATABASE SEARCH; BACTERIAL; DEREPLICATION; PREDICTION; PARALLEL; REVEALS; MODEL;
D O I
10.1101/gr.243477.118
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Predicting biosynthetic gene clusters (BGCs) is critically important for discovery of antibiotics and other natural products. While BGC prediction from complete genomes is a well-studied problem, predicting BGCs in fragmented genomic assemblies remains challenging. The existing BGC prediction tools often assume that each BGC is encoded within a single contig in the genome assembly, a condition that is violated for most sequenced microbial genomes where BGCs are often scattered through several contigs, making it difficult to reconstruct them. The situation is even more severe in shotgun metagenomics, where the contigs are often short, and the existing tools fail to predict a large fraction of long BGCs. While it is difficult to assemble BGCs in a single contig, the structure of the genome assembly graph often provides clues on how to combine multiple contigs into segments encoding long BGCs. We describe biosyntheticSPAdes, a tool for predicting BGCs in assembly graphs and demonstrate that it greatly improves the reconstruction of BGCs from genomic and metagenomics data sets.
引用
收藏
页码:1352 / 1362
页数:11
相关论文
共 51 条
[11]   Small molecules from the human microbiota [J].
Donia, Mohamed S. ;
Fischbach, Michael A. .
SCIENCE, 2015, 349 (6246)
[12]   A Systematic Analysis of Biosynthetic Gene Clusters in the Human Microbiome Reveals a Common Family of Antibiotics [J].
Donia, Mohamed S. ;
Cimermancic, Peter ;
Schulze, Christopher J. ;
Brown, Laura C. Wieland ;
Martin, John ;
Mitreva, Makedonka ;
Clardy, Jon ;
Linington, Roger G. ;
Fischbach, Michael A. .
CELL, 2014, 158 (06) :1402-1414
[13]   Accelerated Profile HMM Searches [J].
Eddy, Sean R. .
PLOS COMPUTATIONAL BIOLOGY, 2011, 7 (10)
[14]   Structure and biosynthesis of the jamaicamides, new mixed polyketide-peptide neurotoxins from the marine cyanobacterium Lyngbya majuscula [J].
Edwards, DJ ;
Marquez, BL ;
Nogle, LM ;
McPhail, K ;
Goeger, DE ;
Roberts, MA ;
Gerwick, WH .
CHEMISTRY & BIOLOGY, 2004, 11 (06) :817-833
[15]   Improved metagenome assemblies and taxonomic binning using long-read circular consensus sequence data [J].
Frank, J. A. ;
Pan, Y. ;
Tooming-Klunderud, A. ;
Eijsink, V. G. H. ;
McHardy, A. C. ;
Nederbragt, A. J. ;
Pope, P. B. .
SCIENTIFIC REPORTS, 2016, 6
[16]   Metagenome Mining Reveals Polytheonamides as Posttranslationally Modified Ribosomal Peptides [J].
Freeman, Michael F. ;
Gurgui, Cristian ;
Helf, Maximilian J. ;
Morinaka, Brandon I. ;
Uria, Agustinus R. ;
Oldham, Neil J. ;
Sahl, Hans-Georg ;
Matsunaga, Shigeki ;
Piel, Joern .
SCIENCE, 2012, 338 (6105) :387-390
[17]   Increased diversity of peptidic natural products revealed by modification-tolerant database search of mass spectra [J].
Gurevich, Alexey ;
Mikheenko, Alla ;
Shlemov, Alexander ;
Korobeynikov, Anton ;
Mohimani, Hosein ;
Pevzner, Pavel A. .
NATURE MICROBIOLOGY, 2018, 3 (03) :319-327
[18]   QUAST: quality assessment tool for genome assemblies [J].
Gurevich, Alexey ;
Saveliev, Vladislav ;
Vyahhi, Nikolay ;
Tesler, Glenn .
BIOINFORMATICS, 2013, 29 (08) :1072-1075
[19]   IMG-ABC: A Knowledge Base To Fuel Discovery of Biosynthetic Gene Clusters and Novel Secondary Metabolites [J].
Hadjithomas, Michalis ;
Chen, I-Min Amy ;
Chu, Ken ;
Ratner, Anna ;
Palaniappan, Krishna ;
Szeto, Ernest ;
Huang, Jinghua ;
Reddy, T. B. K. ;
Cimermancic, Peter ;
Fischbach, Michael A. ;
Ivanova, Natalia N. ;
Markowitz, Victor M. ;
Kyrpides, Nikos C. ;
Pati, Amrita .
MBIO, 2015, 6 (04)
[20]   ART: a next-generation sequencing read simulator [J].
Huang, Weichun ;
Li, Leping ;
Myers, Jason R. ;
Marth, Gabor T. .
BIOINFORMATICS, 2012, 28 (04) :593-594