SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines

被引:87
作者
Cao, Renzhi [1 ]
Wang, Zheng [2 ]
Wang, Yiheng [2 ]
Cheng, Jianlin [1 ]
机构
[1] Univ Missouri, Christopher S Bond Life Sci Ctr, Inst Informat, Dept Comp Sci, Columbia, MO 65211 USA
[2] Univ So Mississippi, Sch Comp, Hattiesburg, MS 39406 USA
来源
BMC BIOINFORMATICS | 2014年 / 15卷
基金
美国国家科学基金会;
关键词
SCORING FUNCTION; SECONDARY STRUCTURE; LOCAL QUALITY; RECOGNITION; MULTICOM; SERVER; PCONS; FOLD;
D O I
10.1186/1471-2105-15-120
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: It is important to predict the quality of a protein structural model before its native structure is known. The method that can predict the absolute local quality of individual residues in a single protein model is rare, yet particularly needed for using, ranking and refining protein models. Results: We developed a machine learning tool (SMOQ) that can predict the distance deviation of each residue in a single protein model. SMOQ uses support vector machines (SVM) with protein sequence and structural features (i.e. basic feature set), including amino acid sequence, secondary structures, solvent accessibilities, and residue-residue contacts to make predictions. We also trained a SVM model with two new additional features (profiles and SOV scores) on 20 CASP8 targets and found that including them can only improve the performance when real deviations between native and model are higher than 5 angstrom. The SMOQ tool finally released uses the basic feature set trained on 85 CASP8 targets. Moreover, SMOQ implemented a way to convert predicted local quality scores into a global quality score. SMOQ was tested on the 84 CASP9 single-domain targets. The average difference between the residue-specific distance deviation predicted by our method and the actual distance deviation on the test data is 2.637 angstrom. The global quality prediction accuracy of the tool is comparable to other good tools on the same benchmark. Conclusion: SMOQ is a useful tool for protein single model quality assessment. Its source code and executable are available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/.
引用
收藏
页数:8
相关论文
共 45 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] Applying undertaker cost functions to model quality assessment
    Archie, John
    Karplus, Kevin
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2009, 75 (03) : 550 - 555
  • [3] Protein structure prediction and structural genomics
    Baker, D
    Sali, A
    [J]. SCIENCE, 2001, 294 (5540) : 93 - 96
  • [4] QMEAN: A comprehensive scoring function for model quality assessment
    Benkert, Pascal
    Tosatto, Silvio C. E.
    Schomburg, Dietmar
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2008, 71 (01) : 261 - 277
  • [5] QMEAN server for protein model quality estimation
    Benkert, Pascal
    Kuenzli, Michael
    Schwede, Torsten
    [J]. NUCLEIC ACIDS RESEARCH, 2009, 37 : W510 - W514
  • [6] QMEANclust: estimation of protein model quality by combining a composite scoring function with structural density information
    Benkert, Pascal
    Schwede, Torsten
    Tosatto, Silvio C. E.
    [J]. BMC STRUCTURAL BIOLOGY, 2009, 9
  • [7] A METHOD TO IDENTIFY PROTEIN SEQUENCES THAT FOLD INTO A KNOWN 3-DIMENSIONAL STRUCTURE
    BOWIE, JU
    LUTHY, R
    EISENBERG, D
    [J]. SCIENCE, 1991, 253 (5016) : 164 - 170
  • [8] SCRATCH: a protein structure and structural feature prediction server
    Cheng, J
    Randall, AZ
    Sweredoski, MJ
    Baldi, P
    [J]. NUCLEIC ACIDS RESEARCH, 2005, 33 : W72 - W76
  • [9] Prediction of global and local quality of CASP8 models by MULTICOM series
    Cheng, Jianlin
    Wang, Zheng
    Tegge, Allison N.
    Eickholt, Jesse
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2009, 77 : 181 - 184
  • [10] VERIFICATION OF PROTEIN STRUCTURES - PATTERNS OF NONBONDED ATOMIC INTERACTIONS
    COLOVOS, C
    YEATES, TO
    [J]. PROTEIN SCIENCE, 1993, 2 (09) : 1511 - 1519