BrEPS 2.0: Optimization of sequence pattern prediction for enzyme annotation

被引:6
作者
Dudek, Christian-Alexander [1 ]
Dannheim, Henning [1 ]
Schomburg, Dietmar [1 ]
机构
[1] Tech Univ Carolo Wilhelmina Braunschweig, Dept Bioinformat & Biochem, Braunschweig Integrated Ctr Syst Biol BRICS, D-38106 Braunschweig, Germany
关键词
PROTEIN FAMILY;
D O I
10.1371/journal.pone.0182216
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The prediction of gene functions is crucial for a large number of different life science areas. Faster high throughput sequencing techniques generate more and larger datasets. The manual annotation by classical wet-lab experiments is not suitable for these large amounts of data. We showed earlier that the automatic sequence pattern-based BrEPS protocol, based on manually curated sequences, can be used for the prediction of enzymatic functions of genes. The growing sequence databases provide the opportunity for more reliable patterns, but are also a challenge for the implementation of automatic protocols. We reimplemented and optimized the BrEPS pattern generation to be applicable for larger datasets in an acceptable timescale. Primary improvement of the new BrEPS protocol is the enhanced data selection step. Manually curated annotations from Swiss-Prot are used as reliable source for function prediction of enzymes observed on protein level. The pool of sequences is extended by highly similar sequences from TrEMBL and SwissProt. This allows us to restrict the selection of Swiss-Prot entries, without losing the diversity of sequences needed to generate significant patterns. Additionally, a supporting pattern type was introduced by extending the patterns at semi-conserved positions with highly similar amino acids. Extended patterns have an increased complexity, increasing the chance to match more sequences, without losing the essential structural information of the pattern. To enhance the usability of the database, we introduced enzyme function prediction based on consensus EC numbers and IUBMB enzyme nomenclature. BrEPS is part of the Braunschweig Enzyme Database (BRENDA) and is available on a completely redesigned website and as download. The database can be downloaded and used with the BrEPScmd command line tool for large scale sequence analysis. The BrEPS website and downloads for the database creation tool, command line tool and database are freely accessible at http://breps.tu-bs.de.
引用
收藏
页数:12
相关论文
共 23 条
[1]   The Structure-Function Linkage Database [J].
Akiva, Eyal ;
Brown, Shoshana ;
Almonacid, Daniel E. ;
Barber, Alan E., II ;
Custer, Ashley F. ;
Hicks, Michael A. ;
Huang, Conrad C. ;
Lauck, Florian ;
Mashiyama, Susan T. ;
Meng, Elaine C. ;
Mischel, David ;
Morris, John H. ;
Ojha, Sunil ;
Schnoes, Alexandra M. ;
Stryke, Doug ;
Yunes, Jeffrey M. ;
Ferrin, Thomas E. ;
Holliday, Gemma L. ;
Babbitt, Patricia C. .
NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) :D521-D530
[2]  
[Anonymous], NUCL ACIDS RES
[3]  
[Anonymous], NUCL ACIDS RES
[4]  
[Anonymous], NUCL ACIDS RES
[5]   The PRINTS database: a fine-grained protein sequence annotation and analysis resource-its status in 2012 [J].
Attwood, Teresa K. ;
Coletta, Alain ;
Muirhead, Gareth ;
Pavlopoulou, Athanasia ;
Philippou, Peter B. ;
Popov, Ivan ;
Roma-Mateo, Carlos ;
Theodosiou, Athina ;
Mitchell, Alex L. .
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2012,
[6]   BrEPS: a flexible and automatic protocol to compute enzyme-specific sequence profiles for functional annotation [J].
Bannert, C. ;
Welfle, A. ;
aus dem Spring, C. ;
Schomburg, D. .
BMC BIOINFORMATICS, 2010, 11
[7]   BLAST plus : architecture and applications [J].
Camacho, Christiam ;
Coulouris, George ;
Avagyan, Vahram ;
Ma, Ning ;
Papadopoulos, Jason ;
Bealer, Kevin ;
Madden, Thomas L. .
BMC BIOINFORMATICS, 2009, 10
[8]   Enzyme-specific profiles for genome annotation: PRIAM [J].
Claudel-Renard, C ;
Chevalet, C ;
Faraut, T ;
Kahn, D .
NUCLEIC ACIDS RESEARCH, 2003, 31 (22) :6633-6639
[9]   EFFICIENT ALGORITHMS FOR AGGLOMERATIVE HIERARCHICAL-CLUSTERING METHODS [J].
DAY, WHE ;
EDELSBRUNNER, H .
JOURNAL OF CLASSIFICATION, 1984, 1 (01) :7-24
[10]   EFFICIENT ALGORITHM FOR A COMPLETE LINK METHOD [J].
DEFAYS, D .
COMPUTER JOURNAL, 1977, 20 (04) :364-366