Large-scale model quality assessment for improving protein tertiary structure prediction

被引：40

作者：

Cao, Renzhi ^{[1
]}

Bhattacharya, Debswapna ^{[1
]}

Adhikari, Badri ^{[1
]}

Li, Jilong ^{[1
]}

Cheng, Jianlin ^{[1
,2
,3
]}

机构：

[1] Univ Missouri, Dept Comp Sci, Columbia, MO 65211 USA

[2] Univ Missouri, Inst Informat, Columbia, MO 65211 USA

[3] Univ Missouri, C Bond Life Sci Ctr, Columbia, MO 65211 USA

来源：

BIOINFORMATICS | 2015年 / 31卷 / 12期

关键词：

SECONDARY STRUCTURE; MULTICOM; SINGLE; RECOGNITION; FRAGMENTS; SEQUENCES; FEATURES; FOLD;

D O I：

10.1093/bioinformatics/btv235

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Motivation: Sampling structural models and ranking them are the two major challenges of protein structure prediction. Traditional protein structure prediction methods generally use one or a few quality assessment (QA) methods to select the best-predicted models, which cannot consistently select relatively better models and rank a large number of models well. Results: Here, we develop a novel large-scale model QA method in conjunction with model clustering to rank and select protein structural models. It unprecedentedly applied 14 model QA methods to generate consensus model rankings, followed by model refinement based on model combination (i.e. averaging). Our experiment demonstrates that the large-scale model QA approach is more consistent and robust in selecting models of better quality than any individual QA method. Our method was blindly tested during the 11th Critical Assessment of Techniques for Protein Structure Prediction (CASP11) as MULTICOM group. It was officially ranked third out of all 143 human and server predictors according to the total scores of the first models predicted for 78 CASP11 protein domains and second according to the total scores of the best of the five models predicted for these domains. MULTICOM's outstanding performance in the extremely competitive 2014 CASP11 experiment proves that our large-scale QA approach together with model clustering is a promising solution to one of the two major problems in protein structure modeling.

引用

页码：116 / 123

页数：8

共 42 条

[1] The Protein Data Bank [J].