Linear regression models for solvent accessibility prediction in proteins

被引:94
|
作者
Wagner, M
Adamczak, R
Porollo, A
Meller, J
机构
[1] Childrens Hosp Res Fdn, Div Biomed Informat, Cincinnati, OH 45229 USA
[2] Nicholas Copernicus Univ, Dept Informat, PL-87100 Torun, Poland
关键词
relative solvent accessibility; support vector regression; least squares regression; neural networks; classification; protein structure prediction;
D O I
10.1089/cmb.2005.12.355
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The relative solvent accessibility ( RSA) of an amino acid residue in a protein structure is a real number that represents the solvent exposed surface area of this residue in relative terms. The problem of predicting the RSA from the primary amino acid sequence can therefore be cast as a regression problem. Nevertheless, RSA prediction has so far typically been cast as a classification problem. Consequently, various machine learning techniques have been used within the classification framework to predict whether a given amino acid exceeds some ( arbitrary) RSA threshold and would thus be predicted to be "exposed," as opposed to " buried." We have recently developed novel methods for RSA prediction using nonlinear regression techniques which provide accurate estimates of the real-valued RSA and outperform classification-based approaches with respect to commonly used two-class projections. However, while their performance seems to provide a significant improvement over previously published approaches, these Neural Network (NN) based methods are computationally expensive to train and involve several thousand parameters. In this work, we develop alternative regression models for RSA prediction which are computationally much less expensive, involve orders-of-magnitude fewer parameters, and are still competitive in terms of prediction quality. In particular, we investigate several regression models for RSA prediction using linear L-1-support vector regression (SVR) approaches as well as standard linear least squares (LS) regression. Using rigorously derived validation sets of protein structures and extensive cross-validation analysis, we compare the performance of the SVR with that of LS regression and NN-based methods. In particular, we show that the flexibility of the SVR ( as encoded by metaparameters such as the error insensitivity and the error penalization terms) can be very beneficial to optimize the prediction accuracy for buried residues. We conclude that the simple and computationally much more efficient linear SVR performs comparably to nonlinear models and thus can be used in order to facilitate further attempts to design more accurate RSA prediction methods, with applications to fold recognition and de novo protein structure prediction methods.
引用
收藏
页码:355 / 369
页数:15
相关论文
共 50 条
  • [1] Prediction and evolutionary information analysis of protein solvent accessibility using multiple linear regression
    Wang, JY
    Lee, HM
    Ahmad, S
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2005, 61 (03) : 481 - 491
  • [2] PREDICTION OF RELATIVE SOLVENT ACCESSIBILITY USING PACE REGRESSION
    Meshkin, Alireza
    Sadeghi, Mehdi
    Ghasem-Aghaee, Nasser
    EXCLI JOURNAL, 2009, 8 : 211 - 217
  • [3] Combining prediction of secondary structure and solvent accessibility in proteins
    Adamczak, R
    Porollo, A
    Meller, J
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2005, 59 (03) : 467 - 475
  • [4] Prediction of coordination number and relative solvent accessibility in proteins
    Pollastri, G
    Baldi, P
    Fariselli, P
    Casadio, R
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2002, 47 (02) : 142 - 153
  • [5] Atom-wise statistics and prediction of solvent accessibility in proteins
    Singh, Y. Hemajit
    Gromiha, M. Michael
    Sarai, Akinori
    Ahmad, Shandar
    BIOPHYSICAL CHEMISTRY, 2006, 124 (02) : 145 - 154
  • [6] Sann: Solvent accessibility prediction of proteins by nearest neighbor method
    Joo, Keehyoung
    Lee, Sung Jong
    Lee, Jooyoung
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2012, 80 (07) : 1791 - 1797
  • [7] AN EXTREME LEARNING MACHINE CLASSIFIER FOR PREDICTION OF RELATIVE SOLVENT ACCESSIBILITY IN PROTEINS
    Saraswathi, Saras
    Kloczkowski, Andrzej
    Jernigan, Robert L.
    ICFC 2010/ ICNC 2010: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FUZZY COMPUTATION AND INTERNATIONAL CONFERENCE ON NEURAL COMPUTATION, 2010, : 364 - 369
  • [8] REFINED PREDICTION FOR LINEAR-REGRESSION MODELS
    HESS, JL
    GUNST, RF
    NAVAL RESEARCH LOGISTICS, 1978, 25 (04) : 715 - 725
  • [9] Validity Limit of the Linear Regression Models for The Prediction
    Akossou, A. Y. J.
    Palm, R.
    INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS & STATISTICS, 2010, 16 (M10): : 38 - 48
  • [10] Real value solvent accessibility prediction using adaptive support vector regression
    Gubbi, Jayavardhana
    Shilton, Alistair
    Palaniswami, Marimuthu
    Parker, Michael
    2007 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2007, : 395 - +