Large-scale model quality assessment for improving protein tertiary structure prediction

被引:40
作者
Cao, Renzhi [1 ]
Bhattacharya, Debswapna [1 ]
Adhikari, Badri [1 ]
Li, Jilong [1 ]
Cheng, Jianlin [1 ,2 ,3 ]
机构
[1] Univ Missouri, Dept Comp Sci, Columbia, MO 65211 USA
[2] Univ Missouri, Inst Informat, Columbia, MO 65211 USA
[3] Univ Missouri, C Bond Life Sci Ctr, Columbia, MO 65211 USA
关键词
SECONDARY STRUCTURE; MULTICOM; SINGLE; RECOGNITION; FRAGMENTS; SEQUENCES; FEATURES; FOLD;
D O I
10.1093/bioinformatics/btv235
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Sampling structural models and ranking them are the two major challenges of protein structure prediction. Traditional protein structure prediction methods generally use one or a few quality assessment (QA) methods to select the best-predicted models, which cannot consistently select relatively better models and rank a large number of models well. Results: Here, we develop a novel large-scale model QA method in conjunction with model clustering to rank and select protein structural models. It unprecedentedly applied 14 model QA methods to generate consensus model rankings, followed by model refinement based on model combination (i.e. averaging). Our experiment demonstrates that the large-scale model QA approach is more consistent and robust in selecting models of better quality than any individual QA method. Our method was blindly tested during the 11th Critical Assessment of Techniques for Protein Structure Prediction (CASP11) as MULTICOM group. It was officially ranked third out of all 143 human and server predictors according to the total scores of the first models predicted for 78 CASP11 protein domains and second according to the total scores of the best of the five models predicted for these domains. MULTICOM's outstanding performance in the extremely competitive 2014 CASP11 experiment proves that our large-scale QA approach together with model clustering is a promising solution to one of the two major problems in protein structure modeling.
引用
收藏
页码:116 / 123
页数:8
相关论文
共 42 条
[1]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[2]   3Drefine: Consistent protein structure refinement by optimizing hydrogen bonding network and atomic-level energy minimization [J].
Bhattacharya, Debswapna ;
Cheng, Jianlin .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2013, 81 (01) :119-131
[3]   A METHOD TO IDENTIFY PROTEIN SEQUENCES THAT FOLD INTO A KNOWN 3-DIMENSIONAL STRUCTURE [J].
BOWIE, JU ;
LUTHY, R ;
EISENBERG, D .
SCIENCE, 1991, 253 (5016) :164-170
[4]   SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines [J].
Cao, Renzhi ;
Wang, Zheng ;
Wang, Yiheng ;
Cheng, Jianlin .
BMC BIOINFORMATICS, 2014, 15
[5]   Designing and evaluating the MULTICOM protein local and global model quality prediction methods in the CASP10 experiment [J].
Cao, Renzhi ;
Wang, Zheng ;
Cheng, Jianlin .
BMC STRUCTURAL BIOLOGY, 2014, 14
[6]   SCRATCH: a protein structure and structural feature prediction server [J].
Cheng, J ;
Randall, AZ ;
Sweredoski, MJ ;
Baldi, P .
NUCLEIC ACIDS RESEARCH, 2005, 33 :W72-W76
[7]   The MULTICOM toolbox for protein structure prediction [J].
Cheng, Jianlin ;
Li, Jilong ;
Wang, Zheng ;
Eickholt, Jesse ;
Deng, Xin .
BMC BIOINFORMATICS, 2012, 13
[8]  
Cymerman Iwona A., 2009, P293, DOI 10.1007/978-1-4020-9058-5_12
[9]  
Dobson CM, 1998, ANGEW CHEM INT EDIT, V37, P868, DOI 10.1002/(SICI)1521-3773(19980420)37:7<868::AID-ANIE868>3.0.CO
[10]  
2-H