Data mining of metagenomes to find novel enzymes: a non-computationally intensive method

被引:2
作者
Gongora-Castillo, Elsa [1 ]
Lopez-Ochoa, Luisa A. [2 ]
Apolinar-Hernandez, Max M. [3 ,4 ]
Caamal-Pech, Aldo M. [3 ]
Contreras-de la Rosa, Perla A. [3 ]
Quiroz-Moreno, Adriana [3 ]
Ramirez-Prado, Jorge H. [3 ]
O'Connor-Sanchez, Aileen [3 ]
机构
[1] Ctr Invest Cient Yucatan AC, CONACYT, Unidad Biotecnol, Merida, Yucatan, Mexico
[2] Ctr Invest Cient Yucatan AC, Unidad Bioquim & Biol Mol, Merida, Yucatan, Mexico
[3] Ctr Invest Cient Yucatan AC, Unidad Biotecnol, Merida, Yucatan, Mexico
[4] Univ Autonoma Nuevo Leon, Inst Biotecnol, San Nicolas De Los Garza, Nuevo Leon, Mexico
关键词
Proteases; NGS; Bioinformatics pipeline; Pattern matching; IDENTIFICATION; PERFORMANCE;
D O I
10.1007/s13205-019-2044-6
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Currently, there is a need of non-computationally-intensive bioinformatics tools to cope with the increase of large datasets produced by Next Generation Sequencing technologies. We present a simple and robust bioinformatics pipeline to search for novel enzymes in metagenomic sequences. The strategy is based on pattern searching using as reference conserved motifs coded as regular expressions. As a case study, we applied this scheme to search for novel proteases S8A in a publicly available metagenome. Briefly, (1) the metagenome was assembled and translated into amino acids; (2) patterns were matched using regular expressions; (3) retrieved sequences were annotated; and (4) diversity analyses were conducted. Following this pipeline, we were able to identify nine sequences containing an S8 catalytic triad, starting from a metagenome containing 9,921,136 Illumina reads. Identity of these nine sequences was confirmed by BLASTp against databases at NCBI and MEROPS. Identities ranged from 62 to 89% to their respective nearest ortholog, which belonged to phyla Proteobacteria, Actinobacteria, Planctomycetes, Bacterioidetes, and Cyanobacteria, consistent with the most abundant phyla reported for this metagenome. All these results support the idea that they all are novel S8 sequences and strongly suggest that our methodology is robust and suitable to detect novel enzymes.
引用
收藏
页数:8
相关论文
共 20 条
[1]   The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update [J].
Afgan, Enis ;
Baker, Dannon ;
Batut, Berenice ;
van den Beek, Marius ;
Bouvier, Dave ;
Cech, Martin ;
Chilton, John ;
Clements, Dave ;
Coraor, Nate ;
Gruening, Bjoern A. ;
Guerler, Aysam ;
Hillman-Jackson, Jennifer ;
Hiltemann, Saskia ;
Jalili, Vahid ;
Rasche, Helena ;
Soranzo, Nicola ;
Goecks, Jeremy ;
Taylor, James ;
Nekrutenko, Anton ;
Blankenberg, Daniel .
NUCLEIC ACIDS RESEARCH, 2018, 46 (W1) :W537-W544
[2]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[3]   PHYLOGENETIC IDENTIFICATION AND IN-SITU DETECTION OF INDIVIDUAL MICROBIAL-CELLS WITHOUT CULTIVATION [J].
AMANN, RI ;
LUDWIG, W ;
SCHLEIFER, KH .
MICROBIOLOGICAL REVIEWS, 1995, 59 (01) :143-169
[4]  
[Anonymous], 2018, QUAL CONTROL TOOL HI
[5]   The MEME Suite [J].
Bailey, Timothy L. ;
Johnson, James ;
Grant, Charles E. ;
Noble, William S. .
NUCLEIC ACIDS RESEARCH, 2015, 43 (W1) :W39-W49
[6]   IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes [J].
Chen, I-Min A. ;
Chu, Ken ;
Palaniappan, Krishna ;
Pillay, Manoj ;
Ratner, Anna ;
Huang, Jinghua ;
Huntemann, Marcel ;
Varghese, Neha ;
White, James R. ;
Seshadri, Rekha ;
Smirnova, Tatyana ;
Kirton, Edward ;
Jungbluth, Sean P. ;
Woyke, Tanja ;
Eloe-Fadrosh, Emiley A. ;
Ivanova, Natalia N. ;
Kyrpides, Nikos C. .
NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) :D666-D677
[7]  
DELANGE RJ, 1968, J BIOL CHEM, V243, P2134
[8]   New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0 [J].
Guindon, Stephane ;
Dufayard, Jean-Francois ;
Lefort, Vincent ;
Anisimova, Maria ;
Hordijk, Wim ;
Gascuel, Olivier .
SYSTEMATIC BIOLOGY, 2010, 59 (03) :307-321
[9]  
Jisha V. N., 2013, ADV ENZYME RES, V1, P39, DOI DOI 10.4236/AER.2013.13005
[10]   MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability [J].
Katoh, Kazutaka ;
Standley, Daron M. .
MOLECULAR BIOLOGY AND EVOLUTION, 2013, 30 (04) :772-780