Predicting the Protein Folding Rate Based on Sequence Feature Screening and Support Vector Regression

被引:0
作者
Li Yong
Zhou Wei
Dai Zhi-Jun
Chen Yuan
Wang Zhi-Ming
Yuan Zhe-Ming [1 ]
机构
[1] Hunan Agr Univ, Hunan Prov Key Lab Crop Germplasm Innovat & Utili, Changsha 410128, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
Protein folding; Folding rate prediction; High-dimensional feature; Feature screening; Support vector regression; HIGH-DIMENSIONAL DATA; AMINO-ACID-SEQUENCE; FEATURE-SELECTION; CONTACT ORDER; ALGORITHM;
D O I
10.3866/PKU.WHXB201404091
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
Folding rate prediction plays an important role in clarifying the protein folding mechanism. In this work, we collected 115 protein samples with known folding rates including two-, multi-, and mixed-state proteins. To characterize the primary structure information of the protein molecules more comprehensively, we considered sequence length, residue components with different scales, k-space features for pair residues, and geostatistics association features among different locations of the residues substituted with corresponding physical-chemical properties. Each protein sequence was represented by a numeric vector containing 9357 numbers. We selected 23 features with a clear meaning from the above-mentioned high-dimensional features for each sample, after conducting an improved binary matrix shuffling filter and a worst descriptor elimination multi-round method. We constructed a nonlinear support vector regression (SVR) model based on the folding rate and the 23 retained features. The correlation coefficient of the Jackknife cross validation was 0.95. Our prediction accuracy was superior to other results from the literature and other reference feature selection methods. Finally, we established an interpretability system for SVR, and our data showed that the nonlinear regression relationship between the folding rates and the reserved features was highly significant. By further analyzing the effects of each retained descriptor on protein folding rates, the results showed that the protein folding rate might be closely related to the sequence length, the features associated with the medium-and short-range, the triplet residues component features, etc.
引用
收藏
页码:1091 / 1098
页数:8
相关论文
共 30 条
  • [11] GUO JX, 2010, PROG BIOCHEM BIOPHYS, V37, P12
  • [12] Prediction of HLA-A*0201 Restricted Cytotoxic T Lymphocyte Epitopes Based on High-Dimensional Descriptor Nonlinear Screening
    Han Na
    Yuan Zhe-Ming
    Chen Yuan
    Dai Zhi-Jun
    Wang Zhi-Ming
    [J]. ACTA PHYSICO-CHIMICA SINICA, 2013, 29 (09) : 1945 - 1953
  • [13] Coupling between Properties of the Protein Shape and the Rate of Protein Folding
    Ivankov, Dmitry N.
    Bogatyreva, Natalya S.
    Lobanov, Michail Yu
    Galzitskaya, Oxana V.
    [J]. PLOS ONE, 2009, 4 (08):
  • [14] Prediction of protein folding rates from the amino acid sequence-predicted secondary structure
    Ivankov, DN
    Finkelstein, AV
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (24) : 8942 - 8944
  • [15] Contact order revisited: Influence of protein size on the folding rate
    Ivankov, DN
    Garbuzynskiy, SO
    Alm, E
    Plaxco, KW
    Baker, D
    Finkelstein, AV
    [J]. PROTEIN SCIENCE, 2003, 12 (09) : 2057 - 2062
  • [16] Prediction of Protein Folding Rates from Primary Sequences Using Hybrid Sequence Representation
    Jiang, Yingfu
    Iglinski, Paul
    Kurgan, Lukasz
    [J]. JOURNAL OF COMPUTATIONAL CHEMISTRY, 2009, 30 (05) : 772 - 783
  • [17] AAindex: amino acid index database, progress report 2008
    Kawashima, Shuichi
    Pokarowski, Piotr
    Pokarowska, Maria
    Kolinski, Andrzej
    Katayama, Toshiaki
    Kanehisa, Minoru
    [J]. NUCLEIC ACIDS RESEARCH, 2008, 36 : D202 - D205
  • [18] Leardi R, 2000, J CHEMOMETR, V14, P643, DOI 10.1002/1099-128X(200009/12)14:5/6<643::AID-CEM621>3.0.CO
  • [19] 2-E
  • [20] Direct correlation between proteins' folding rates and their amino acid compositions: An ab initio folding rate prediction
    Ma, Bin-Guang
    Guo, Jian-Xiu
    Zhang, Hong-Yu
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2006, 65 (02) : 362 - 372