Prediction of Protein-Protein Interaction Sites in Sequences and 3D Structures by Random Forests

被引:138
作者
Sikic, Mile [1 ]
Tomic, Sanja [2 ]
Vlahovicek, Kristian [3 ]
机构
[1] Univ Zagreb, Fac Elect Engn & Comp, Dept Elect Syst & Informat Proc, Zagreb 41000, Croatia
[2] Rudjer Boskovic Inst, Zagreb, Croatia
[3] Univ Zagreb, Fac Sci, Dept Mol Biol, Bioinformat Grp, Zagreb 41000, Croatia
关键词
SOLVENT ACCESSIBILITY; BINDING; CLASSIFIER; CONSERVATION; INTERFACES; INSIGHTS; PROFILE;
D O I
10.1371/journal.pcbi.1000278
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Identifying interaction sites in proteins provides important clues to the function of a protein and is becoming increasingly relevant in topics such as systems biology and drug discovery. Although there are numerous papers on the prediction of interaction sites using information derived from structure, there are only a few case reports on the prediction of interaction residues based solely on protein sequence. Here, a sliding window approach is combined with the Random Forests method to predict protein interaction sites using (i) a combination of sequence-and structure-derived parameters and (ii) sequence information alone. For sequence-based prediction we achieved a precision of 84% with a 26% recall and an F-measure of 40%. When combined with structural information, the prediction performance increases to a precision of 76% and a recall of 38% with an F-measure of 51%. We also present an attempt to rationalize the sliding window size and demonstrate that a nine-residue window is the most suitable for predictor construction. Finally, we demonstrate the applicability of our prediction methods by modeling the Ras-Raf complex using predicted interaction sites as target binding interfaces. Our results suggest that it is possible to predict protein interaction sites with quite a high accuracy using only sequence information.
引用
收藏
页数:9
相关论文
共 42 条
[1]   Real value prediction of solvent accessibility from amino acid sequence [J].
Ahmad, S ;
Gromiha, MM ;
Sarai, A .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2003, 50 (04) :629-635
[2]   Prediction of protein-protein interactions by combining structure and sequence conservation in protein interfaces [J].
Aytuna, AS ;
Gursoy, A ;
Keskin, O .
BIOINFORMATICS, 2005, 21 (12) :2850-2855
[3]   The universal protein resource (UniProt) [J].
Bairoch, Amos ;
Bougueleret, Lydie ;
Altairac, Severine ;
Amendolia, Valeria ;
Auchincloss, Andrea ;
Puy, Ghislaine Argoud ;
Axelsen, Kristian ;
Baratin, Delphine ;
Blatter, Marie-Claude ;
Boeckmann, Brigitte ;
Bollondi, Laurent ;
Boutet, Emmanuel ;
Quintaje, Silvia Braconi ;
Breuza, Lionel ;
Bridge, Alan ;
deCastro, Edouard ;
Coral, Danielle ;
Coudert, Elisabeth ;
Cusin, Isabelle ;
Dobrokhotov, Pavel ;
Dornevil, Dolnide ;
Duvaud, Severine ;
Estreicher, Anne ;
Famiglietti, Livia ;
Feuermann, Marc ;
Gehant, Sebastian ;
Farriol-Mathis, Nathalie ;
Ferro, Serenella ;
Gasteiger, Elisabeth ;
Gateau, Alain ;
Gerritsen, Vivienne ;
Gos, Arnaud ;
Gruaz-Gumowski, Nadine ;
Hinz, Ursula ;
Hulo, Chantal ;
Hulo, Nicolas ;
Ioannidis, Vassilios ;
Ivanyi, Ivan ;
James, Janet ;
Jain, Eric ;
Jimenez, Silvia ;
Jungo, Florence ;
Junker, Vivien ;
Keller, Guillaume ;
Lachaize, Corinne ;
Lane-Guermonprez, Lydie ;
Langendijk-Genevaux, Petra ;
Lara, Vicente ;
Lemercier, Philippe ;
Le Saux, Virginie .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D193-D197
[4]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[5]   Insights into protein-protein interfaces using a Bayesian network prediction method [J].
Bradford, James R. ;
Needham, Chris J. ;
Bulpitt, Andrew J. ;
Westhead, David R. .
JOURNAL OF MOLECULAR BIOLOGY, 2006, 362 (02) :365-386
[6]   Improved prediction of protein-protein binding sites using a support vector machines approach [J].
Bradford, JR ;
Westhead, DR .
BIOINFORMATICS, 2005, 21 (08) :1487-1494
[7]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[8]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[9]   Predicting protein interaction sites: binding hot-spots in protein-protein and protein-ligand interfaces [J].
Burgoyne, Nicholas J. ;
Jackson, Richard M. .
BIOINFORMATICS, 2006, 22 (11) :1335-1342
[10]   Predicting residue solvent accessibility from protein sequence by considering the sequence environment [J].
Carugo, O .
PROTEIN ENGINEERING, 2000, 13 (09) :607-609