MacSyFinder: A Program to Mine Genomes for Molecular Systems with an Application to CRISPR-Cas Systems

被引:264
作者
Abby, Sophie S. [1 ,2 ]
Neron, Bertrand [3 ]
Menager, Herve [3 ]
Touchon, Marie [1 ,2 ]
Rocha, Eduardo P. C. [1 ,2 ]
机构
[1] Inst Pasteur, Paris, France
[2] CNRS, UMR3525, Paris, France
[3] Inst Pasteur, Ctr Informat Biol, Paris, France
基金
欧洲研究理事会;
关键词
PALINDROMIC REPEATS; IDENTIFICATION; EVOLUTION; TIGRFAMS; CONTEXT; OPERONS; ORDER; TOOL;
D O I
10.1371/journal.pone.0110726
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Motivation: Biologists often wish to use their knowledge on a few experimental models of a given molecular system to identify homologs in genomic data. We developed a generic tool for this purpose. Results: Macromolecular System Finder (MacSyFinder) provides a flexible framework to model the properties of molecular systems (cellular machinery or pathway) including their components, evolutionary associations with other systems and genetic architecture. Modelled features also include functional analogs, and the multiple uses of a same component by different systems. Models are used to search for molecular systems in complete genomes or in unstructured data like metagenomes. The components of the systems are searched by sequence similarity using Hidden Markov model (HMM) protein profiles. The assignment of hits to a given system is decided based on compliance with the content and organization of the system model. A graphical interface, MacSyView, facilitates the analysis of the results by showing overviews of component content and genomic context. To exemplify the use of MacSyFinder we built models to detect and class CRISPR-Cas systems following a previously established classification. We show that MacSyFinder allows to easily define an accurate "Cas-finder" using publicly available protein profiles. Availability and Implementation: MacSyFinder is a standalone application implemented in Python. It requires Python 2.7, Hmmer and makeblastdb (version 2.2.28 or higher). It is freely available with its source code under a GPLv3 license at https://github.com/gem-pasteur/macsyfinder. It is compatible with all platforms supporting Python and Hmmer/makeblastdb. The "Cas-finder" (models and HMM profiles) is distributed as a compressed tarball archive as Supporting Information.
引用
收藏
页数:9
相关论文
共 27 条
[1]   The Non-Flagellar Type III Secretion System Evolved from the Bacterial Flagellum and Diversified into Host-Cell Adapted Systems [J].
Abby, Sophie S. ;
Rocha, Eduardo P. C. .
PLOS GENETICS, 2012, 8 (09)
[2]   The cell as a collection of protein machines: Preparing the next generation of molecular biologists [J].
Alberts, B .
CELL, 1998, 92 (03) :291-294
[3]   Cas9 Targeting and the CRISPR Revolution [J].
Barrangou, Rodolphe .
SCIENCE, 2014, 344 (6185) :707-708
[4]   CRISPR-Cas Systems: Prokaryotes Upgrade to Adaptive Immunity [J].
Barrangou, Rodolphe ;
Marraffini, Luciano A. .
MOLECULAR CELL, 2014, 54 (02) :234-244
[5]   CRISPR Recognition Tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats [J].
Bland, Charles ;
Ramsey, Teresa L. ;
Sabree, Fareedah ;
Lowe, Micheal ;
Brown, Kyndall ;
Kyrpides, Nikos C. ;
Hugenholtz, Philip .
BMC BIOINFORMATICS, 2007, 8 (1)
[6]   To acquire or resist: the complex biological effects of CRISPR-Cas systems [J].
Bondy-Denomy, Joseph ;
Davidson, Alan R. .
TRENDS IN MICROBIOLOGY, 2014, 22 (04) :218-225
[7]   BLAST plus : architecture and applications [J].
Camacho, Christiam ;
Coulouris, George ;
Avagyan, Vahram ;
Ma, Ning ;
Papadopoulos, Jason ;
Bealer, Kevin ;
Madden, Thomas L. .
BMC BIOINFORMATICS, 2009, 10
[8]   Accelerated Profile HMM Searches [J].
Eddy, Sean R. .
PLOS COMPUTATIONAL BIOLOGY, 2011, 7 (10)
[9]   Profile hidden Markov models [J].
Eddy, SR .
BIOINFORMATICS, 1998, 14 (09) :755-763
[10]   PILER-CR: Fast and accurate identification of CRISPR repeats [J].
Edgar, Robert C. .
BMC BIOINFORMATICS, 2007, 8 (1)