Predicting the Protein Folding Rate Based on Sequence Feature Screening and Support Vector Regression

被引:0
作者
Li Yong
Zhou Wei
Dai Zhi-Jun
Chen Yuan
Wang Zhi-Ming
Yuan Zhe-Ming [1 ]
机构
[1] Hunan Agr Univ, Hunan Prov Key Lab Crop Germplasm Innovat & Utili, Changsha 410128, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
Protein folding; Folding rate prediction; High-dimensional feature; Feature screening; Support vector regression; HIGH-DIMENSIONAL DATA; AMINO-ACID-SEQUENCE; FEATURE-SELECTION; CONTACT ORDER; ALGORITHM;
D O I
10.3866/PKU.WHXB201404091
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
Folding rate prediction plays an important role in clarifying the protein folding mechanism. In this work, we collected 115 protein samples with known folding rates including two-, multi-, and mixed-state proteins. To characterize the primary structure information of the protein molecules more comprehensively, we considered sequence length, residue components with different scales, k-space features for pair residues, and geostatistics association features among different locations of the residues substituted with corresponding physical-chemical properties. Each protein sequence was represented by a numeric vector containing 9357 numbers. We selected 23 features with a clear meaning from the above-mentioned high-dimensional features for each sample, after conducting an improved binary matrix shuffling filter and a worst descriptor elimination multi-round method. We constructed a nonlinear support vector regression (SVR) model based on the folding rate and the 23 retained features. The correlation coefficient of the Jackknife cross validation was 0.95. Our prediction accuracy was superior to other results from the literature and other reference feature selection methods. Finally, we established an interpretability system for SVR, and our data showed that the nonlinear regression relationship between the folding rates and the reserved features was highly significant. By further analyzing the effects of each retained descriptor on protein folding rates, the results showed that the protein folding rate might be closely related to the sequence length, the features associated with the medium-and short-range, the triplet residues component features, etc.
引用
收藏
页码:1091 / 1098
页数:8
相关论文
共 30 条
  • [1] K-Fold: a tool for the prediction of the protein folding kinetic order and rate
    Capriotti, E.
    Casadio, R.
    [J]. BIOINFORMATICS, 2007, 23 (03) : 385 - 386
  • [2] LIBSVM: A Library for Support Vector Machines
    Chang, Chih-Chung
    Lin, Chih-Jen
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
  • [3] A Novel QSAR Model Based on Geostatistics and Support Vector Regression
    Chen Yuan
    Yuan Zhe-Ming
    Zhou Wei
    Xiong Xing-Yao
    [J]. ACTA PHYSICO-CHIMICA SINICA, 2009, 25 (08) : 1587 - 1592
  • [4] Swfoldrate: Predicting protein folding rates from amino acid sequence with sliding window method
    Cheng, Xiang
    Xiao, Xuan
    Wu, Zhi-cheng
    Wang, Pu
    Lin, Wei-zhong
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2013, 81 (01) : 140 - 148
  • [5] A Novel Method of Nonlinear Rapid Feature Selection for High Dimensional Data and Its Application in Peptide QSAR Modeling Based on Support Vector Machine
    Dai Zhi-Jun
    Zhou Wei
    Yuan Zhe-Ming
    [J]. ACTA PHYSICO-CHIMICA SINICA, 2011, 27 (07) : 1654 - 1660
  • [6] Chain length is the main determinant of the folding rate for proteins with three-state folding kinetics
    Galzitskaya, OV
    Garbuzynskiy, SO
    Ivankov, DN
    Finkelstein, AV
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2003, 51 (02) : 162 - 166
  • [7] Accurate prediction of protein folding rates from sequence and sequence-derived residue flexibility and solvent accessibility
    Gao, Jianzhao
    Zhang, Tuo
    Zhang, Hua
    Shen, Shiyi
    Ruan, Jishou
    Kurgan, Lukasz
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2010, 78 (09) : 2114 - 2130
  • [8] Local secondary structure content predicts folding rates for simple, two-state proteins
    Gong, HP
    Isom, DG
    Srinivasan, R
    Rose, GD
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2003, 327 (05) : 1149 - 1154
  • [9] Influence of medium and long range interactions in protein folding
    Gromiha, MM
    Selvaraj, S
    [J]. PREPARATIVE BIOCHEMISTRY & BIOTECHNOLOGY, 1999, 29 (04) : 339 - 351
  • [10] RETRACTED: Predicting Protein Folding Rates Using the Concept of Chou's Pseudo Amino Acid Composition (Retracted article. See vol. 33, pg. 2614, 2012)
    Guo, Jianxiu
    Rao, Nini
    Liu, Guangxiong
    Yang, Yong
    Wang, Gang
    [J]. JOURNAL OF COMPUTATIONAL CHEMISTRY, 2011, 32 (08) : 1612 - 1617