Structure-based identification of catalytic residues

被引:8
作者
Yahalom, Ran [2 ]
Reshef, Dan [1 ]
Wiener, Ayana [2 ]
Frankel, Sagiv [2 ]
Kalisman, Nir [2 ]
Lerner, Boaz [3 ]
Keasar, Chen [1 ,2 ]
机构
[1] Ben Gurion Univ Negev, Dept Life Sci, IL-84105 Beer Sheva, Israel
[2] Ben Gurion Univ Negev, Dept Comp Sci, IL-84105 Beer Sheva, Israel
[3] Ben Gurion Univ Negev, Dept Ind Engn & Management, IL-84105 Beer Sheva, Israel
基金
美国国家卫生研究院;
关键词
catalytic residues; functional annotation; support vector machine (SVM); energy terms; spatial averaging; feature selection; class imbalance; SUPPORT VECTOR MACHINE; PROTEIN STRUCTURES; ACTIVE-SITES; FUNCTIONAL SITES; PREDICTION; STABILITY; SEQUENCE; ENZYMES; CLASSIFICATION; SUBSTITUTIONS;
D O I
10.1002/prot.23020
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The identification of catalytic residues is an essential step in functional characterization of enzymes. We present a purely structural approach to this problem, which is motivated by the difficulty of evolution-based methods to annotate structural genomics targets that have few or no homologs in the databases. Our approach combines a state-of-the-art support vector machine (SVM) classifier with novel structural features that augment structural clues by spatial averaging and Z scoring. Special attention is paid to the class imbalance problem that stems from the overwhelming number of non-catalytic residues in enzymes compared to catalytic residues. This problem is tackled by: (1) optimizing the classifier to maximize a performance criterion that considers both Type I and Type II errors in the classification of catalytic and non-catalytic residues; (2) under-sampling non-catalytic residues before SVM training; and (3) during SVM training, penalizing errors in learning catalytic residues more than errors in learning noncatalytic residues. Tested on four enzyme datasets, one specifically designed by us to mimic the structural genomics scenario and three previously evaluated datasets, our structure-based classifier is never inferior to similar structure-based classifiers and comparable to classifiers that use both structural and evolutionary features. In addition to the evaluation of the performance of catalytic residue identification, we also present detailed case studies on three proteins. This analysis suggests that many false positive predictions may correspond to binding sites and other functional residues. A web server that implements the method, our own-designed database, and the source code of the programs are publicly available at http://www.cs.bgu.ac.il/similar to meshi/functionPrediction. Proteins 2011; 79:1952-1963. (C) 2011 Wiley-Liss, Inc.
引用
收藏
页码:1952 / 1963
页数:12
相关论文
共 54 条
[1]   Network analysis of protein structures identifies functional residues [J].
Amitai, G ;
Shemesh, A ;
Sitbon, E ;
Shklar, M ;
Netanely, D ;
Venger, I ;
Pietrokovski, S .
JOURNAL OF MOLECULAR BIOLOGY, 2004, 344 (04) :1135-1146
[2]   Assessing the accuracy of prediction algorithms for classification: an overview [J].
Baldi, P ;
Brunak, S ;
Chauvin, Y ;
Andersen, CAF ;
Nielsen, H .
BIOINFORMATICS, 2000, 16 (05) :412-424
[3]   Analysis of catalytic residues in enzyme active sites [J].
Bartlett, GJ ;
Porter, CT ;
Borkakoti, N ;
Thornton, JM .
JOURNAL OF MOLECULAR BIOLOGY, 2002, 324 (01) :105-121
[4]   Enzyme/non-enzyme discrimination and prediction of enzyme active site location using charge-based methods [J].
Bate, P ;
Warwicker, J .
JOURNAL OF MOLECULAR BIOLOGY, 2004, 340 (02) :263-276
[5]   Structural bases of stability-function tradeoffs in enzymes [J].
Beadle, BM ;
Shoichet, BK .
JOURNAL OF MOLECULAR BIOLOGY, 2002, 321 (02) :285-296
[6]   Looking at enzymes from the inside out: The proximity of catalytic residues to the molecular centroid can be used for detection of active sites and enzyme-ligand interfaces [J].
Ben-Shimon, A ;
Eisenstein, M .
JOURNAL OF MOLECULAR BIOLOGY, 2005, 351 (02) :309-326
[7]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[8]  
Bishop CM., 1995, NEURAL NETWORKS PATT
[9]  
Bouckaert RR, 2004, LECT NOTES ARTIF INT, V3056, P3
[10]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167