Rigorous assessment and integration of the sequence and structure based features to predict hot spots

被引:6
作者
Chen, Ruoying [1 ,2 ]
Chen, Wenjing [1 ]
Yang, Sixiao [1 ]
Wu, Di [3 ]
Wang, Yong [4 ]
Tian, Yingjie [1 ]
Shi, Yong [1 ,5 ]
机构
[1] Chinese Acad Sci, Res Ctr Fictitious Econ & Data Sci, Beijing 100190, Peoples R China
[2] Chinese Acad Sci, Grad Univ, Coll Life Sci, Beijing 100049, Peoples R China
[3] Tongji Univ, Coll Life Sci & Technol, Dept Biomed Engn, Shanghai 200092, Peoples R China
[4] Chinese Acad Sci, Acad Math & Syst Sci, Beijing 100190, Peoples R China
[5] Univ Nebraska, Coll Informat Sci & Technol, Omaha, NE 68182 USA
基金
中国国家自然科学基金;
关键词
PROTEIN-INTERACTION SITES; FREE-ENERGY; HSSP DATABASE; BINDING INTERFACE; RESIDUES; IDENTIFICATION; CONSERVATION; RECEPTOR; COMPLEX; RECOGNITION;
D O I
10.1186/1471-2105-12-311
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need. Results: In this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes). While in Ab-dataset (antigen-antibody complexes are excluded), there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs). The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes. Conclusion: Experimental results show that support vector machine classifiers are quite effective in predicting hot spots based on sequence features. Hot spots cannot be fully predicted through simple analysis based on physicochemical characteristics, but there is reason to believe that integration of features and machine learning methods can remarkably improve the predictive performance for hot spots.
引用
收藏
页数:14
相关论文
共 95 条
[1]   Probing the effect of point mutations at protein-protein interfaces with free energy calculations [J].
Almlöf, M ;
Åqvist, J ;
Smalås, AO ;
Brandsdal, BO .
BIOPHYSICAL JOURNAL, 2006, 90 (02) :433-442
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   PSI-BLAST pseudocounts and the minimum description length principle [J].
Altschul, Stephen F. ;
Gertz, E. Michael ;
Agarwala, Richa ;
Schaffer, Alejandro A. ;
Yu, Yi-Kuo .
NUCLEIC ACIDS RESEARCH, 2009, 37 (03) :815-824
[4]  
[Anonymous], 2000, NATURE STAT LEARNING, DOI DOI 10.1007/978-1-4757-3264-1
[5]   PCRPi: Presaging Critical Residues in Protein interfaces, a new computational tool to chart hot spots in protein interfaces [J].
Assi, Salam A. ;
Tanaka, Tomoyuki ;
Rabbitts, Terence H. ;
Fernandez-Fuentes, Narcis .
NUCLEIC ACIDS RESEARCH, 2010, 38 (06) :e86.1-e86.11
[6]   Prediction of protein-protein interactions by combining structure and sequence conservation in protein interfaces [J].
Aytuna, AS ;
Gursoy, A ;
Keskin, O .
BIOINFORMATICS, 2005, 21 (12) :2850-2855
[7]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[8]   Anatomy of hot spots in protein interfaces [J].
Bogan, AA ;
Thorn, KS .
JOURNAL OF MOLECULAR BIOLOGY, 1998, 280 (01) :1-9
[9]   Predicting protein interaction sites: binding hot-spots in protein-protein and protein-ligand interfaces [J].
Burgoyne, Nicholas J. ;
Jackson, Richard M. .
BIOINFORMATICS, 2006, 22 (11) :1335-1342
[10]   Predicting functionally important residues from sequence conservation [J].
Capra, John A. ;
Singh, Mona .
BIOINFORMATICS, 2007, 23 (15) :1875-1882