Structure Based Thermostability Prediction Models for Protein Single Point Mutations with Machine Learning Tools

被引:39
作者
Jia, Lei [1 ]
Yarlagadda, Ramya [2 ]
Reed, Charles C. [2 ]
机构
[1] Amgen Inc, Thousand Oaks, CA 91320 USA
[2] Intrexon, Germantown, MD USA
关键词
THERMODYNAMIC DATABASE; STABILITY CHANGES; PROTHERM; CLASSIFICATION; PARAMETERS; REGRESSION; HYDRATION; MUTANTS; SERVER; QSAR;
D O I
10.1371/journal.pone.0138022
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Thermostability issue of protein point mutations is a common occurrence in protein engineering. An application which predicts the thermostability of mutants can be helpful for guiding decision making process in protein design via mutagenesis. An in silico point mutation scanning method is frequently used to find "hot spots" in proteins for focused mutagenesis. ProTherm (http://gibk26.bio.kyutech.ac.jp/jouhou/Protherm/protherm.html) is a public database that consists of thousands of protein mutants' experimentally measured thermostability. Two data sets based on two differently measured thermostability properties of protein single point mutations, namely the unfolding free energy change (ddG) and melting temperature change (dTm) were obtained from this database. Folding free energy change calculation from Rosetta, structural information of the point mutations as well as amino acid physical properties were obtained for building thermostability prediction models with informatics modeling tools. Five supervised machine learning methods (support vector machine, random forests, artificial neural network, naive Bayes classifier, K nearest neighbor) and partial least squares regression are used for building the prediction models. Binary and ternary classifications as well as regression models were built and evaluated. Data set redundancy and balancing, the reverse mutations technique, feature selection, and comparison to other published methods were discussed. Rosetta calculated folding free energy change ranked as the most influential features in all prediction models. Other descriptors also made significant contributions to increasing the accuracy of the prediction models.
引用
收藏
页数:19
相关论文
共 46 条
[1]   ProTherm, version 4.0: thermodynamic database for proteins and mutants [J].
Bava, KA ;
Gromiha, MM ;
Uedaira, H ;
Kitajima, K ;
Sarai, A .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D120-D121
[2]   PROTEIN STABILITY CURVES [J].
BECKTEL, WJ ;
SCHELLMAN, JA .
BIOPOLYMERS, 1987, 26 (11) :1859-1877
[3]   Combining Structural Modeling with Ensemble Machine Learning to Accurately Predict Protein Fold Stability and Binding Affinity Effects upon Mutation [J].
Berliner, Niklas ;
Teyra, Joan ;
Colak, Recep ;
Garcia Lopez, Sebastian ;
Kim, Philip M. .
PLOS ONE, 2014, 9 (09)
[4]   DEVELOPMENT OF HYDROPHOBICITY PARAMETERS TO ANALYZE PROTEINS WHICH BEAR POSTTRANSLATIONAL OR COTRANSLATIONAL MODIFICATIONS [J].
BLACK, SD ;
MOULD, DR .
ANALYTICAL BIOCHEMISTRY, 1991, 193 (01) :72-82
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]  
Burlingame AL, 2000, MASS SPECTROMETRY IN
[7]   Protein folding: Then and now [J].
Chen, Yiwen ;
Ding, Feng ;
Nie, Huifen ;
Serohijos, Adrian W. ;
Sharma, Shantanu ;
Wilcox, Kyle C. ;
Yin, Shuangye ;
Dokholyan, Nikolay V. .
ARCHIVES OF BIOCHEMISTRY AND BIOPHYSICS, 2008, 469 (01) :4-19
[8]   SUPPORT-VECTOR NETWORKS [J].
CORTES, C ;
VAPNIK, V .
MACHINE LEARNING, 1995, 20 (03) :273-297
[9]   Chemical predictive modelling to improve compound quality [J].
Cumming, John G. ;
Davis, Andrew M. ;
Muresan, Sorel ;
Haeberlein, Markus ;
Chen, Hongming .
NATURE REVIEWS DRUG DISCOVERY, 2013, 12 (12) :948-962
[10]  
Darby N.J., 1993, PROTEIN STRUCTURE