Classification of conformational stability of protein mutants from 3D pseudo-folding graph representation of protein sequences using support vector machines

被引:25
作者
Fernandez, Michael [1 ]
Caballero, Julio [1 ,2 ]
Fernandez, Leyden [1 ]
Abreu, Jose Ignacio [1 ,3 ]
Acosta, Gianco [4 ]
机构
[1] Univ Matanzas, Fac Agron, Ctr Biotechnol Studies, Mol Modeling Grp, Matanzas 44740, Cuba
[2] Univ Talca, Ctr Bioinformat & Simulac Mol, Talca, Chile
[3] Univ Matanzas, Fac Informat, Artificial Intelligence Lab, Matanzas 44740, Cuba
[4] Natl Bioinformat Ctr, Havana 10200, Cuba
关键词
protein stability prediction; point mutations; kernel-based methods; graph similarity;
D O I
10.1002/prot.21524
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
This work reports a novel 3D pseudo-folding graph representation of protein sequences for modeling purposes. Amino acids euclidean distances matrices (EDMs) encode primary structural information. Amino Acid Pseudo-Folding 3D Distances Count (AAp3DC) descriptors, calculated from the EDMs of a large data set of 1363 single protein mutants of 64 proteins, were tested for building a classifier for the signs of the change of thermal unfolding Gibbs free energy change (Delta Delta G) upon single mutations. An optimum support vector machine (SVM) with a radial basis function (RBF) kernel well recognized stable and unstable mutants with accuracies over 709,6 in crossvalidation test. To the best of our knowledge, this result for stable mutant recognition is the highest ever reported for a sequence-based predictor with more than 1000 mutants. Furthermore, the model adequately classified mutations associated to diseases of human prion protein and human transthyretin.
引用
收藏
页码:167 / 175
页数:9
相关论文
共 44 条
[1]  
AGUEROCHAPIN G, 2006, FEBS LETT, V723, P580
[2]  
[Anonymous], 2001, LIBSVM LIB SUPPORT V
[3]   The effect of disease-associated mutations on the folding pathway of human prion protein [J].
Apetri, AC ;
Surewicz, K ;
Surewicz, WK .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2004, 279 (17) :18008-18014
[4]   On graphical and numerical representation of protein sequences [J].
Bai, FL ;
Wang, TM .
JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 2006, 23 (05) :537-545
[5]   ProTherm, version 4.0: thermodynamic database for proteins and mutants [J].
Bava, KA ;
Gromiha, MM ;
Uedaira, H ;
Kitajima, K ;
Sarai, A .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D120-D121
[6]   Prudent modeling of core polar residues in computational protein design [J].
Bolon, DN ;
Marcus, JS ;
Ross, SA ;
Mayo, SL .
JOURNAL OF MOLECULAR BIOLOGY, 2003, 329 (03) :611-622
[7]  
BURGES CJC, 1998, DATA MIN KNOWL DISC, V2, P7
[8]   Proteometric study of ghrelin receptor function variations upon mutations using amino acid sequence autocorrelation vectors and genetic algorithm-based least square support vector machines [J].
Caballero, Julio ;
Fernandez, Leyden ;
Garriga, Miguel ;
Abreu, Jose Ignacio ;
Collina, Simona ;
Fernandez, Michael .
JOURNAL OF MOLECULAR GRAPHICS & MODELLING, 2007, 26 (01) :166-178
[9]   Amino acid sequence autocorrelation vectors and ensembles of Bayesian-regularized genetic neural networks for prediction of conformational stability of human lysozyme mutants [J].
Caballero, Julio ;
Fernandez, Leyden ;
Abreu, Jose Ignacio ;
Fernandez, Michael .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2006, 46 (03) :1255-1268
[10]   NMR structures of three single-residue variants of the human prion protein [J].
Calzolai, L ;
Lysek, DA ;
Güntert, P ;
von Schroetter, C ;
Riek, R ;
Zahn, R ;
Wüthrich, K .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (15) :8340-8345